Editor's note: Alan MacCormack is a member of the Technological Innovation and Entrepreneurship Group at the MIT Sloan School of Management. He teaches Sloan's core class in innovation and has served as thesis advisor for several students in MIT's System Design and Management Program (SDM).
Ask a systems designer at any major commercial software company to describe the architecture of their product on a whiteboard. They'll typically draw a diagram showing a number of boxes (modules) that perform highly specific functions, with a few neat connections between them. My research shows however, that if you actually measure the interactions between boxes at the code-level, you'll find the architecture is much more tightly coupled than anyone would think. Coupling has its virtues—tight interactions between different pieces of code can lead to increased performance in areas such as speed or memory footprint. But coupling also has major drawbacks, with respect to the ease with which software can be corrected and adapted to meet future needs.
Virtual systems are fundamentally different from other kinds of systems. As an information-based product, software appears to be easy and quick to change—which can be an advantage and a disadvantage. There are no physical changes to be made, yet the complexity of modern software is such that even small modifications can ripple through a system with unintended consequences. Software appears to be malleable, but in practice, the architecture of many systems is opaque. A developer dare not change them too much for fear of creating a tangled web of dependencies and changes to upstream files.
Furthermore, unlike industries such as automobiles and airplanes, which create new platforms from the ground up every few years, modern software development efforts rarely start with a clean slate. Most systems have a significant legacy, on top of which new features and functionality are built. Unfortunately, it's not obvious from looking at the older code which pieces are connected to which others. It's not like working with a mechanical system, where you can see connections simply by inspecting the product, or reverse-engineering its design. Unfortunately, this hard-to-understand legacy code often embeds assumptions and design decisions that are no longer optimal for the system.
Why are initial design decisions often so out-of-whack with the current requirements for a software system? One reason is that the original design may have been built quickly, by a small company or startup more focused on releasing its first product rapidly than on building a framework to last for many years and multiple product evolutions. Software engineers design programs to meet their immediate needs, and in a startup, there is no guarantee that you will be around in 12 months. Speed is of the essence, and any performance edge is pivotal, no matter how you achieve it. Ten years later, however, when the war for market share is over, the needs of a user might be better served by a much more modular, maintainable, and adaptable system. In essence, early design decisions create a "technical debt" that must be paid by all those that follow.
Let me provide a micro-level example of these dynamics. Alice might decide to use a piece of functionality that Robert has already designed in his module, so she writes some code to "call" his function from her module. This saves time, but creates a dependency between Alice's modules and Robert's that may not be transparent to the system architect. Five years down the line, when Robert and Alice have both retired to Tenerife, that dependency may be a complete surprise to a programmer needing to make a change. Changing code in Robert's module may well cause Alice's module to cease functioning.
The work that Andrei Akaikine, SDM '09, did in the thesis I supervised provides a great example of the costs that arise from an architecture that is overly complex. In his thesis, he examined a software system with a long history, which generated significant maintenance costs each year. Every change could create unexpected problems and require additional fixes to other parts of the system. The owner of this system—a large commercial software firm—decided to redesign the software with the goal of adding new features to the system, while simultaneously reducing its complexity (by reducing the coupling between elements). Akaikine showed that the result of this redesign was a significant reduction in maintenance effort, as captured by the time it takes to fix defects.
Of course, any major redesign involves significant costs of its own—management has to decide if these costs are warranted. Unfortunately, many businesses make these decisions based on gut-feel and intuition, rather than a rigorous analysis of the likely payoffs. We need much better data to make informed decisions, and the software industry is woefully lacking in such data. Ultimately, this is why the work I have done with Akaikine and other ESD students—including Daniel Sturtevant, SDM '07, who is working on his PhD—is important. We are among the first research teams to visualize and measure the extent of technical debt in legacy software systems.
To achieve this goal, we have developed pioneering methods for visualizing and measuring attributes of a software architecture that can help us assess its underlying structure. Consider a well-known example from a recent paper, in which we look at the Mozilla web browser. After its release as open-source software in 1998, a major redesign effort was undertaken on the system, with the aim of making the codebase more modular, and hence easier to contribute to. The design structure matrices (DSMs) from before and after this redesign illustrate what happened. The modular architecture that resulted facilitated contributions to the code by creating fewer unintended interactions between components. Before the redesign, each component was, on average, connected to 18 percent of other components. Afterward, this figure dropped to below 3 percent.
Ultimately, different designs will have different performance characteristics along a variety of important dimensions, making techniques like ours valuable for exploring design trade-offs. A highly integrated design is likely to be faster, while a highly modular design may be more reliable. A designer must consider carefully what the product needs to do to arrive at the optimal design for her objectives. For example, if a system has to last 10 years, and you have no idea what it will need to do at the end of that time, the software must be designed to be extremely flexible and evolvable. Unfortunately, very few software companies practice such forward-looking "systems thinking."
How should a firm begin? Nobody should rush headlong into full-blown re-factoring of a major system, given we are still in the infancy of understanding how these efforts work. Indeed, our research reveals that a manager's intuition about where to start such an effort is frequently wrong, given the perceptions of an architecture and the realities embedded in its source code are often in conflict. Software companies first need to generate data on measures of architecture, and begin to link these measures to performance outcomes that they care about. Most firms tinker with and redesign their software all the time—in effect they run hundreds of small experiments every year. Armed with a careful assessment of this data, they will be better placed to assess what works and what doesn't. Ultimately, we know complexity hurts. But reducing it is also a complex endeavor.