Right-sizing Microservices

Categories: Architecture

When building microservices, what is the right amount of code in each service?

Well, I actually have an issue with the common understanding of “microservices” as something new. In my opinion, every large company already has a “microservice” architecture and has had one for decades. Look at any bank or insurance company, going back even to the 1980s - they don’t have a single codebase compiled into a single artifact. Instead they have multiple systems, each organised around a set of business capabilities, with their own development team, their own codebase, their own database, processes for (loosely coupled) data exchange with other systems, and their own independent release cycle. And that’s pretty much the definition of microservices. Of course these things are typically huge - the teams are 100+ developers, the codebases are millions of lines in size, data exchange latencies are often large (eg nightly batches of files), and releases often happen only a couple of times per year. But it’s still the same pattern.

So for me, microservices is simply a matter of taking this traditional system design and turning up the granularity dial. How far the dial should be turned depends upon the business circumstances.

In particular, there are increasing costs associated with increasingly fine granularity, so (IMO) the dial should be turned only when the cost of doing so is less than the cost of a coarser-grained solution. If you have an existing coarse-grained (or even monolithic) system then this means staying with the current state unless there is a reason to change.

There are two distinct reasons to consider microservices:

  • when your development processes are not scaling as desired
  • when your runtime systems are not scaling as desired

In most cases, the first one is far more important.

Runtime performance issues can often be solved far more cheaply by throwing some extra hardware at it than by redesigning the software. Where a real performance issue exists, focusing just on the bottlenecks usually resolves it without system-wide changes. In fact, refactoring a system based on microservice principles can actually reduce performance unless you are very careful where you partition the system.

On the other hand, if the problem is that the development team just can’t keep up with the flow of requests for change, then adding more developers is (a) expensive, and (b) quite often ineffective when the development processes aren’t appropriate. Modular systems (moduliths) can help, as can systems composed of multiple components which are released together (still monolithic in nature). However eventually it may be necessary to have independent release cycles for different features - ie microservices. At that point, the question becomes: how many services should the featureset be implemented as?

Costs of excessively fine granularity:

  • many things to security-audit
  • many things to security-patch
  • many things to document
  • many things to learn (for a new developer or admin)
  • high inter-component integration costs - including:
    • keeping api stability (at both ends)
    • implementing data synchronization
    • dealing with the lack of atomic transactions
    • complex testing for inter-component interactions
    • and much more
  • high infrastructure costs (distributed systems generally require more memory and cpu than the same task done monolithically)
  • difficulties applying cooperative domain modelling (each bounded context is distributed across code-bases)
  • debugging may require analysing inter-process calls
  • refactoring (particularly changes to a domain model) may be a cross-component task, not an intra-component one

Costs of excessively coarse granularity:

  • high collaboration costs (team conflicts)
  • high learning cost for the codebase
  • slow evolution
  • slow time-to-market for new features
  • unclear areas of responsibility
  • unclear domain context boundaries

The cost changes aren’t linear however; the above items have some “tipping points” in them. In particular, people tend to work best in groups of 10 or less. Any codebase that requires tight collaboration in a group larger than that carries a significant cost. There is also a tipping point in codebase complexity, where up to N lines is reasonably understandable, over twice as many lines is much harder, but having half as many lines is not actually a lot easier. Therefore going from “too large” to “right sized” brings a benefit but going to “even smaller” often won’t bring anything additional in terms of code comprehension while bringing costs in other areas.

In my opinion, the right place to “turn the dial up to” is simply the point at which each codebase (and supporting infrastructure such as database schemas) can be maintained by a single team which is small enough to know each other well and communicate regularly. With that reached, all of the items listed as “costs of excessively coarse granularity” are pretty much resolved while taking on as few of the costs1 of the “fine granularity” list as possible.

Some people (particularly from the early days of the microservice movement) suggest that “finer is always better” and recommend architectures in which each service is extremely small. I disagree completely, seeing very high costs for little benefit. The argument is that having more deployable units is somehow “more decoupled”. However the finer-grained a service is, the more other services it needs to collaborate with in order to do its work - and those collaborations are themselves couplings with a much higher price than the kind of coupling that occurs within a single codebase and database. While in theory fine-grained services can be modified or replaced in parallel, in practice their collaborative nature require changes to multiple components for any significant behavioural change.

In recent years, there has been a trend towards saying things like “moduliths are the new microservices” and “every system should start as a monolith”. I would agree completely; systems should use the coarsest granularity that works for them. Most projects start with a fairly small feature-set and small development team, so a (nicely modular) monolith is indeed a good starting point.

A graph of codebase-size vs developer-productivity will look something like an inverted bell-curve: a fixed set of functionality distributed over many small components will be hard to maintain, as will one with the same functionality in a single huge codebase. Somewhere in the middle will be an optimum. Of course there are some additional aspects to consider: a “modulith” has better maintainability than a “big ball of mud”, and systems whose components are team-sized or smaller are more parallelizable (productive) which may be beneficial regardless of whether that approach is more cost-effective.

Therefore, turn up the dial on granularity only when the pain of doing so is less than the pain of not doing so. And consider using the term “service-based architecture” rather than “microservice architecture” to avoid confusion.

Of course it is possible to choose different granularities for different parts of the system. Having rapidly-changing parts divided into team-sized pieces while slower-moving parts are coarse-grained cross-team codebases can be the most cost-efficient solution in some cases, particularly when those larger pieces are “legacy code”.

Many software systems have “natural fault lines” where functionality separates; in DDD terms subdomains or bounded contexts. Separating along these lines can be beneficial - while trying to separate into finer chunks can be very difficult. Systems with excessively fine granularity can often only be implemented via intensive synchronous communication between components (RPC-style) which has very high costs in terms of development time, performance, and system stability; coarser-grained systems aligned with domain boundaries naturally have less need for inter-component communication - and can use data replication to reduce or even eliminate any synchronous dependencies which remain. Implementing domain model changes is also far less painful in a coarse-grained system than a fine-grained one.

As a guide, I worked for a company that provided an online marketplace. With over 30 active back-end developers and 20 to 30 features being release weekly, our monolith became just too painful so we moved to microservices. But we turned up the dial only a notch or two, with the intention of then re-assessing whether another step to even finer granularity would result in less or more pain. So far, the first refactoring has been sufficient; simply returning to one-team-per-code-base provided a major benefit and everything further seems to be cost without obvious benefit. Where your cost/benefit threshold lies depends upon your codebase size, complexity, team-size, and rate of activity.

References and Further Reading

Footnotes

  1. There is of course such a thing as “too large” for a service; as an example Stripe has a “microservice” with 1400 transitive dependencies and a correspondingly slow build-time. There may be benefits in further dividing such a codebase even if it is maintained by a single team.