Architecture for Scale Part II: Event Sourcing

Architecture for Scale Part II: Event Sourcing

In the first part of our architecture for scale series of blogs we discussed CQRS and how it’s a nice architecture for applications that have intensive scaling requirements. The next pattern we will discuss in the series is event sourcing. This is an architecture that compliments CQRS and as such we often find the two architectures being used in unison. Event sourcing is an architectural approach that captures and persists the state of an application as a sequence of events. Instead of the traditional approach of storing the latest, greatest state of an entity in a mutable single entry in the data store, event sourcing stores a log of events that have occurred over time. Essentially, every single thing that happens in a system is stored as an event.

Think of some real world entity like a book. A book can have several editions that are released as time passes. Instead of representing this book in the database as a single entity and updating the edition field every time a new one goes to print, with event sourcing we store every change to the book (in this case each time there is a new edition) as a new event. What this means at a data level is that for every change event for the book we expect to see an additional row persisted in the database, all tied together by some aggregate identifier, such as an ISBN. When this data bubbles up to our application code the current state of the system can be reconstructed by projecting all of these events.

Think of some real world entity like a book. A book can have several editions that are released as time passes. Instead of representing this book in the database as a single entity and updating the edition field every time a new one goes to print, with event sourcing we store every change to the book (in this case each time there is a new edition) as a new event. What this means at a data level is that for every change event for the book we expect to see an additional row persisted in the database, all tied together by some aggregate identifier, such as an ISBN. When this data bubbles up to our application code the current state of the system can be reconstructed by projecting all of these events.

Like any architectural decision it’s important we are guided by the requirements, and when and when not to use event sourcing is no different in this regard. Event sourcing works well when we have auditing and compliance requirements. This is because it provides a complete audit trail of all events that have happened in the system and from this it’s very easy to reason about and trace how entities have evolved towards their current state. Many financial systems have these very requirements and as such event sourcing works well in such contexts. If you think about all the transactions in a bank account and how they roll up into a current balance then you can see why this works well. It would make implementing functionality like generating bank statements very easy. Systems that have intensive historic data analysis requirements are also a good fit for event sourcing as we can apply projections at different points in time to gain insight. Think about an e-commerce system where we would like to hone in on shopping habits across the year’s various seasons, retrospectively.

Another added benefit of using event sourcing is in the scenario where we may want to completely rebuild our application from a data perspective. The obvious question is why may we want to do this? Consider the scenario whereby we have material changes in our non-functional requirements and as such our current choice of data store technology is no longer fit for purpose. In such a scenario we may have mutable data stored relationally and our new requirements may be to store this same data in a more performant, distributed manner, using NoSql technology. The only viable approach in this scenario would be to embark on tactical, usually lengthy, data migration project. Typical approaches such as this involve building a dedicated application to perform and sense check the data migration that is then thrown away once the migration is complete. In the same scenario had we elected for an event sourcing architecture in the first place this task would be far more trivial. With all events already sitting in our event store, our only task would be to build a new data layer in our application, conforming to the new data store technology, and thereafter replay all the events back through the application to migrate the read model data to the new data store.

Electing for event sourcing as your architecture of choice also brings with it overheads. Overall there is increased complexity with event sourcing versus more traditional architectures. It makes the data or repository layer in our application code significantly more complex when writing to and reading from the data store in comparison to more traditional CRUD type approaches. Querying also becomes more of a challenge as we need to replay and project many single events to arrive at the current state of the application. Event sourcing can also lead to increased infrastructure costs. This can be two-fold; firstly we need to store a greater volume of data overall as we have an immutable row for every single event that happens in our system. Furthermore, when you consider combining event sourcing with CQRS it can often be the case where you must invest in different data storage technologies for the read and write verticals of the application.

Because event sourcing is an architecture that is built for scale many notable tech companies employ it. Netflix has event sourcing as a fundamental part of it’s system architecture. They use it for capturing and processing events around user interactions which allows them to analyse user behaviour and optimise content delivery. Amazon utilise event sourcing to track and record events around order processing, inventory management and supply chain operations. Uber is yet another big tech business that has event sourcing at the core of it’s architecture. They use it to capture ride requests, driver assignments and payments, the core functions of their primary use case.