Recovering productivity using modularization
During the growth of any software project, many developers will come to a point where they hit productivity barriers due to compilation, build and test execution times, as well as a complexity overload. Changing the build setup, the development processes, and the architecture by adopting stricter modularization can restore some of the size-related productivity loss. ¹
With this blog post, we want to discuss some tried and tested ideas and methods to improve productivity and overcome these complexity barriers. We will use Java and Gradle for examples and terminology, but the core points also translate to other technologies and tools.
The starting point
Assuming a software project where all the code is contained in one module (e.g., a Gradle project), typical development and continuous integration tasks take the form of recompiling all code for each single code change and running all tests.
Not cleaning output folders and allowing the Java compiler to use its built-in incremental build functions, does not even help in setups where stateless build nodes are spun up freshly for each new pre-/post-integration build around pull requests.
Build and test execution times thus scale linearly as the number of changes and contributors increases.
Divide and conquer
To address this, we first break up the project into multiple modules, reflected in Gradle as multi-project builds. Gradle requires cycle-free dependencies, so, in an example application, we might arrive at a structure with the following Gradle dependencies, with all source code still in the same repository:
Now, with Gradle’s incremental build support, changes in the persistence module will not require re-compilation or re-run of unit-tests for the configuration module. This already cuts down on build times, improving developer productivity.
In real-world systems however, cyclic dependencies — also at compile time — are commonplace. A common solution to allow this while keeping a modular structure is separating modules into interface-only (API) and implementation modules:
Here, changes in the implementation of one module do not cause re-building other implementation modules. Only if the API changes, all its consumers are re-built (compiled, tested). To compose the final application, specific implementation modules (they might be more than one for each API) need to be specified. That is done in the root module, which acts as a dependency injector.
This structure now scales well, because — on a module-level — only code that is affected by changes is rebuilt. If any module becomes excessively large, it can be split into multiple ones. In the best-case scenario, we reduced compile-, unit- and module-test time by going from growing linearly with the codebase to growing logarithmically.
Assuming that unit test code is part of its implementation module — for instance, via a Gradle test configuration — and if the test code dependencies follow the same pattern as shown here, then clean unit tests also come naturally. During the implementation of, e.g., a persistence unit test, one cannot instantiate and use configuration implementation classes. Rather, the desired usage of mocking/stubbing frameworks is enforced via the dependency structure.
In a continuous integration environment, Gradle build caches can reap benefits from such a structure, as build artifacts are fetched from a central cache and, if the most frequent changes will affect only the implementation modules, only two modules require rebuilding, in our example.
Modules encourage abstraction
In addition to build-time improvements, breaking up an application into modules also reduces another large-application problem: complexity. APIs based on good abstractions reduce cognitive load on API-consumers, allowing them to focus on the task at hand, without having to grasp all details of the whole application.² For instance, when working in the configuration implementation module, storing new configuration objects, work becomes easier and faster when using a simple read/write-byte-array interface, without looking at file- or web-based storage implementation code.
Such a complexity reduction speeds up bringing new developers on board and if the modules map the organizational structure,³ this can furthermore reduce communication overhead and merge conflicts between teams.
Of course, in the real world a multitude of other dependency patterns appear. The following are a selected few that we encountered:
When a single domain module (here: configuration) is split by aspects like frontend (UI) and backend (and more, as seen in the previous figure) you can take various approaches:
- One implementation module containing all aspects
- Separating frontend and backend
- Splitting into multiple aspects, as shown in the figure
The advantage of (3) splitting is the strict avoidance of unwanted references, e.g., from the backend code to UI libraries.
Note that the UI and the REST API depend here on the backend API and not the implementation, thus being implementation-agnostic. However, the debug utilities depend on the implementation, assuming a tight coupling.
Such aspect separations are possible independent of where a separation of API and implementation modules takes place.
So far, we assumed that all mentioned modules reside in the same code repository, but once such a clear architecture is introduced into the code and on module-level, the modules can then also be moved to separate repositories to form independent components of which the final application is then composed.
We have presented a method of modularizing large codebases exhibiting compile-time interdependent subsystems.
This methodology enables you to increase the productivity of your development organization, by using incremental build systems in combination with a suitable architecture, as well as reduce build time and developer wait time.
-  Cf. also Martin Fowler’s Design stamina hypothesis
-  As proposed by John Ousterhout in A philosophy of software design
-  Cf. Conway’s law