Architecture for Scale Part I: CQRS

Architecture for Scale Part I: CQRS

Command query responsibility segregation (CQRS) is a type of architecture that separates the read and write responsibilities of an application. A really simple way to think about this architecture is imagining two distinct, vertical silos within a single application. One vertical is responsible for read requests and the other is responsible for handling write requests. If we think of an application such as twitter we can begin to understand the responsibility of each vertical. On the read side imagine opening twitter and viewing your timeline, the read vertical would handle this request, ultimately boiling this down to a database call and returning the latest say 100 tweets in your timeline. In terms of the write silo, this would be the path taken when we post a new tweet and this ultimately results in a write call to the database to persist the tweet.

When we boil CQRS down to it’s low level implementation one of the key differentiators versus more traditional architectures is how the domain tends to look. Write operations are handled by a command model, which receives commands from an upper/outer layer and applies the appropriate business logic before persisting to a database. The read operations are handled by a query model which provides denormalised views optimised for efficient querying and data retrieval. A nice, simple example of a CQRS implementation in Kotlin can be observed here[https://github.com/Creditas/kotlin-ddd-sample].

As you’ll already have worked out CQRS is a more complex pattern to understand and implement in comparison to more traditional architectures. This begs the question of when you should take on the burden of this added complexity? The indicators that suggest CQRS could be a the way to go would be if you have differing, complex requirements for reading and writing data; when your application needs to be highly performant, particularly on the read side of things; when you need to optimise your data storage and querying strategies independently.

If your requirements align with the indicators described above adopting CQRS architecture has several advantages over something more simplistic. Because CQRS allows for the optimisation of the read and write models independently this can lead to better performance for each operation. We also incur the advantage of being able to scale each of these operations independently which means resources can be allocated in a more fine-grained manner which could lead to a reduction in costs. We can build more flexible applications with CQRS in that we can, if necessary, choose different data storage technologies for the read and write models.

Although CQRS is a great architectural choice for scale, like any architecture it doesn’t come without it’s drawbacks. As touched on previously, overall it is more complex than most traditional architectures both from an understanding and implementation perspective. This is in large part due to the overhead of having to maintain separate models for both the read and write sides of the system. Furthermore, by it’s nature CQRS is typically eventually consistent meaning there may be latency between write operations and the corresponding updates in the read model. If a key requirement of your system is to be immediately consistent then CQRS is probably not the best fit.

The one thing we see some clients struggle with when implementing CQRS is deciding on database technologies and in particular whether to elect for different technologies for each vertical or to stick with the same technology for both. Unfortunately there is no one size fits all solution to this problem and it really depends on the specific requirements.

In our experience you should elect for different database technologies in some of the following scenarios. You have or will have at some point in the future a focus on performance optimisation. For example, your requirements may suggest that you will have a high throughput of writes to your database, so high in fact that a relational database may not be able to cope and you instead have to elect for something like Cassandra. However, on the read side of things you may have complex query requirements, whereby you need to perform several joins across tables and something relational like MySql would be a better choice for the read side of things. Another scenario that could lead you down the diverging database technologies path is if you envisage having significantly different scaling requirements across each vertical in your application. If the verticals have to handle significantly varying workloads it could be unnecessarily expensive putting all your eggs in one basket and having to scale uniformly across each vertical because you are bound by a single database technology or node.

If you don’t have any of the requirements mentioned above we recommend keeping things simple and choosing a single data storage technology. The main advantage to be had in this case is that having a single data storage medium significantly simplifies the development process. Using a single database technology can also help with consistency requirements in that latency across a single technology in terms of consistency is typically more immediate than across different ones. Furthermore, using a single technology also makes things like transactions far easier, more atomic and more durable than it would be when using different data storage technologies.