| Abstract: Working on large refactorings is quite different from working on smaller ones. In this article we have a closer look at some issues involved with large refactorings. |
Most large refactorings can easily be composed out of smaller refactorings. And yet, performing large refactorings is different from performing smaller ones. Why ? A first reason is the number of lines of code involved in the refactoring. Because of this increase in lines we can't pay as much attention to each of the individual steps as we would for smaller refactorings.
Inevitably, some issues (we 'll call these design defects) are overlooked on the outset of a large refactoring. During the refactoring these defects manifest themselves in the form of tricky complications that slow down the ongoing refactoring effort.
Naive attempts to resolve such defects in an ad hoc basis often fail. Trying to approach the problems issue by issues we find ourselves trapped in a deep recursion of refactorings. Each step seems to reveal more design defects. It's easily to get lost in such a situation so rather than digging all our way downward we need a more strategic approach.
Large refactorings are characterized by the number of features involved. For large refactorings the number of classes, methods and variables involved is indeed several times larger than for similar small scale refactorings.
Exactly this increase in number of features makes large refactorings challenging. The more features a piece of code depends upon the bigger the chance that complications will arise during the refactoring of that code. Now given that the very purpose of refactoring is to make structural improvements to existing code by means of a safe refactoring mechanics we may wonder where these difficulties are coming from.
Actually these complications are unaddressed code smells, little design defects that where barely noticeable previously but that now suddenly stand in the way for further evolution of the design. Each design defect causes a subtle deviation from the clear mechanics of the refactoring catalog. As such the presence of design defects requires some extra creativity from the refactoring practitioner.
Before we continue let's try to define what we mean by design defect.
Let's have a closer look at the definition ... A design defect is a feature implementation ... . Well, it means we don't really care how the feature is implemented. It could be a class, a method, perhaps even just a field. Further in the definition ... that provides functional behavior ... indicates that any modifications to the implementation should be behavior preserving ( Hey, this is a refactoring article ). The last part ... but that itself has become an obstacle for the further evolution of the code ... is a bit strange. Defined as such whether a particular feature implementation is considered a design defect depends upon the kind of change we would like to make to the code.
As the number of features raises, even the most simple refactorings start to feel slow due to the many lines of code that need to be adjusted. Even with modern tools that help getting the boring stuff for us we want to reduce the number of intermediate changes that ripple through all over the source code. The number of features involved in a refactoring is what we call the refactoring footprint.
In order to keep the number of code updates low many large refactorings include refactoring-enabling design improvements. The purpose of such preparatory refactorings is to reduce the lines of code that need to be updated while unraveling another part of the code base. Reducing the number of updated not only allows us to work faster, it also helps us keeping focused on the design smell at hand.
Working on a large refactoring we get to know both the smooth and the bumpy areas of a design. Those bumpy areas are often being characterized by high density of design defects, often coupling related issues. When the coupling among the features is too high we may find ourselves spending far more time dealing with side-activities rather than working on the design improvement we 're targeting at. It almost feels like the design is resisting the refactoring.
When we sense such 'design resistance' to a certain refactoring it's important that we identify its cause. Many times design resistance indicates a hidden or neglected problem that makes the refactoring difficult. Being confronted with design resistance its better to defer the large refactoring and deal with the root cause(s) of the problem.
The challenge with large refactorings is not getting lost in the complexity of the design while we attempt to improve it. Actually, this is far more difficult than it sounds. Working on a large refactoring many times reveals smelly code area's the team has forgotten about. Now one thing to do is to refactor mercilessly and start refactoring those smelly areas first.
Getting all of the source clean by refactoring mercilessly is good attitude. However, for large refactorings it's often not the best strategy to follow: the number of unaddressed code smells is just too big. In heroic attempts to clean out the code tackling smell after smell we soon find ourselves overwhelmed. For sure, the code is getting better. But, in an attempt to deal with all those code smells at once we get so distracted that we barely make any progress on the large refactoring we set out, thereby deferring its completion.
In order to make steady progress on large refactorings we use a more strategic approach. Concentrating on the target design, we only invest time in refactoring areas that immediately contribute to the completion of the large refactoring. We need to keep focused and keep in mind what the ongoing refactoring is about. Code smells causing difficulties for the ongoing refactoring need to be dealt with immediately. Other, less urgent code smells are left alone and deferred until the ongoing large refactoring has been completed. While doing so it may be helpful to keep a list of smells we deferred.
Another constraint large refactorings put on the team is the time they take to complete. Not only are large refactorings slower to perform, they are also more difficult to estimate due to the various design defects that may pop up during the course of a large refactoring.
Being slow and difficult to estimate certainly doesn't make large refactorings popular. Given realistic project constraints, the effort large refactorings take is usually more than a project manager can allow the team to invest in the design. Even when the payoff is big, release dates need to be met too. So working on large refactorings we need to be cautious not to threaten the release date. If they do it's not uncommon for management to decide a code freeze or forbid refactoring altogether.
When a large refactoring cannot be completed within the available time frame it needs to be split over different design stages. A design stage is a rest point in the ongoing refactoring at which we can safely interrupt the ongoing refactoring without leaving the design in a degenerate state. If we're unable to finish a large refactoring today and we can't tell when it will be completed it's utmost important to leave the design in a reasonably stable state, without having too much loose ends around. For instance, in smaller refactorings temporarily duplicating code is acceptable because it will be removed long before the refactoring is completed. For refactorings that can be interrupted we need to be more careful with duplicating code because doing so could leave the design in a state worse than before.
Good candidate design stages make improvements that are of immediate benefit to the code even if the larger refactoring it is part of is never completed. Indeed, due to the dynamic nature of nowadays software projects it never might. Changing priorities, unexpected feature requests, new insights all influence the big picture of the design we're heading at. Given the effort invested in a large refactoring it's only reasonable to expect some intermediate payoff.
If we want to evolve the higher layers of abstraction of a design it becomes important that each of the developers on the team understands what is going on at the higher layers. It's essential that developers understand what parts of the design are intermediate stages and where ongoing evolution is targeted at.
Moving towards new abstractions the team needs to ensure that each of its team member understands the benefits of these new abstractions. The reason is we want freshly written code to be written in terms of the new abstractions rather than in terms of the older abstractions we are trying to get rid of. When the ongoing evolution is poorly understood among the team members some developers may continue using the old abstractions thereby increasing the overall refactoring effort.
Here's a simple example:

accessing subclass features through an interface
By itself accessing subclass features by means of an interface, rather than calling those features directly from the superclass doesn't make much sense. Indeed, without prior knowledge of the ongoing refactoring many developers will argue to the interface isn't needed and plea for calling the subclass features directly. But if we consider the given class diagram as an intermediate stage evolving from a template method to strategy based solution then the diagram perfectly does make sense.

sketching the missing context
Because both class diagrams and sources present a static view of the software neither does a good job in communicating the ongoing evolution. Using a visual notations that sketches the design at each of its intermediate stages helps communicating the dynamics of the ongoing refactoring effort.
We have motivated why large refactorings are different from smaller refactorings and argued for a more strategic approach for performing large refactorings. Highlighting some of the technical difficulties such as the refactoring footprint, timing constraints and design defects we have identified some of the issues that make large refactorings challenging. Additionally to working in small increments we've proposed a strategy that performs large refactorings in stages, where each stage leaves the design in a reasonable state. Finally we've motivated the importance for being able to understand and communicate the ongoing evolution of the design over the different design stages.