| Abstract: When refactoring we try to improve the design of our code by making our code easier to understand whilst at the same time making it easier to maintain. In this article we have a closer look at the interaction between these distinct goals and how they form distinct phases within the refactoring cycle. |
When refactoring we improve the design of an existing code base by incrementally applying small behavior preserving changes to the code. However, our objective to improve the design is twofold: We want the code to become easier to understand. At the same time we also want our code to become easier to extend and maintain.
Both these goals lead us to use different tactics when refactoring: We can make code easier to understand by having the hidden concepts reflected explicitly throughout the code. Doing so involves the extraction of methods, classes, interfaces and the like, all activities that make a code base expand. On the other hand, when we try to make code easier to maintain we focus on duplication removal. This involves pulling up methods and fields, removing method duplicates, inlining classes and the like, all activities that reduce the amount of code.
As a result refactorings often happen in two complementary phases: an expansion phase and a contraction phase.
Especially for large refactorings the interactions between the expansion and the contraction phases becomes apparent. The diagram below sketches the interaction between the expansion and the contraction phase.

the refactoring cycle
Structural improvement of a design often requires new software artifacts. Such artifacts can be small as variables, field or methods or larger as classes, interfaces or even packages. Because such structural improvements tend to expand the overall size of the code base we call this phase the expansion phase.
For instance, bloated classes that host far to many responsibilities can be structurally improved by repeatedly applying Extract Class [F]. Doing so, however requires the introduction of new classes. In the diagram below three such freshly extracted classes are shown in gray.

the expansion phase
During the expansion phase we try to improve our understanding of the code by making hidden concepts more explicitly expressed within the code.
In order to expose unrecognized similarities in the code base we start lining up variable names and their types, method names and their corresponding signatures and at times, even entire classes and interfaces. With the similarities lined up we can now focus more clearly on the differences between each of those similar pieces of code. Once the differences are identified we start separating the specific from the generic behavior, thereby improving the level of abstraction. There plenty of mechanisms for doing so ranging from the creation of helper methods to the introduction of entirely new classes or interfaces. Whichever mechanism we choose, by extracting the specifics out of the similarities the duplication among the code base becomes even more explicit. At this point we shift to the contraction phase.
Concluding, at the end of the expansion phase :
The difficulty with code duplication is to update each of the copies when changes are required. Not only is the maintenance of duplicated code tedious and error prone, it's also difficult because each of copies tend to diverge as the software further evolves. The second phase in the refactoring cycle concentrates on the elimination of code duplication. Because doing so reduces the overall size of the code base we call this the contraction phase.
For instance, when classes share similar components we can reduce the amount of code by unifying the components.

the contraction phase
During the contraction phase we simplify the design by removing redundant code. By such an aggressive ban on dead code and explicit duplication the amount of code can significantly be reduced. More important about the contraction phase is the consolidation of previously unrecognized abstractions. Because contraction reduces the number of copies needed to express a certain concept further divergence of the code base is being prevented. Because contraction reduces the number of copies needed to express a certain concept it improves the overall cohesiveness of the code. In general this makes changes related to the abstraction easier and changes foreign to the abstraction harder, thus preventing further divergence of the code base.
Concluding, at the end of the contraction phase:
So far we 've considered the expansion and contraction phases individually. In the diagram below we can see both phases at work. The example is pretty straightforward: we've found two nearly identical classes and want to get rid of the duplication using Extract Superclass [F]. The little gray bars in the diagram show the duplicated features.

extracting a superclass
A first step in preparation of Extract Superclass [F] is the introduction of a parent class that will host the common behavior of both classes. We think up a good name and add the parent class. Now we'd like to Pull Up the duplicate features. Our first attempt fails because of subtle little differences between the two clones: a different variable name here, a different signature there. No big deal. We start lining up. After a few tweaks there and there the source for the duplicated features is identical. Time to start contracting and get rid of the duplication. With a combination of Pull Up Field [F] and Pull Up Method [F] we can pull the duplicated features up to the superclass. When the duplication among the sibling classes has disappeared we're done with the contraction phase.
Especially for large refactorings that cannot be completed over a short period of time the interaction between expansion and contraction becomes important.
When refactoring contraction-only we only consider the removal of duplicated code. To a certain extent this works but the most important disadvantage of contraction-only refactoring is that missing abstractions in the code base remain missing. Eventually this leads us to get stuck with certain forms of duplication that cannot be removed.
Another extreme is expansion-only refactoring. Expansion-only refactoring is solely focused on the introduction of new abstractions to the code base. The problem with expansion-only refactoring is that it keeps many loose ends around. When the code base further evolves those loose ends tend to diverge again.
We've described the Refactoring Cycle as an ongoing interaction between two collaborating phases, the expansion phase and the contraction phase. A refactoring in the expansion phase makes the code easier to understand by means of structural improvements to the code. A refactoring in contraction phase improves the maintainability of the code by means of duplication removal.