Eventual Data Consistency Solution in ServiceComb - part 1

5 minute read

Data consistency is a critical aspect of many systems, especially in cloud and microservice environment. We recently released Saga project under ServiceComb to address data consistency issue. But why is data consistency so important and what is Saga?

Data Consistency of Monolithic Applications

Imagine we are a giant corporation who runs an airline, a car rental company, and a hotel chain. We provide one-stop trip planning experience for our customers, so that they only have to provide a destination and we will book a flight, rent a car, and reserve a hotel room for them. From the business point of view, we have to make sure bookings with all the three are successful to make a successful trip plan, or there will be no plan.

This requirement is easily satisfied with our current monolithic application. We just had to wrap all the three bookings in a single database transaction and they would all be done successfully or none was done.

Monolithic Application

When this feature released, our business was happy, and our customer was happy.

Data Consistency in the Microservice Scenario

In a few years, our corporation is doing so well on this trip planning business and our customers grow over tenfold. As more services provided by our airline, car rental company, and hotel chain, our application and development teams also grow. Now our monolith is so big and complex that not a single person understands how everything works together. What's even worse is that it takes weeks for all the development teams to put together their changes for a new release. Our business is not happy, since our market share is dropping due to the delay of our new features.

We decide to split our monolith into four microservices, flight booking, car rental, hotel reservation, and payment, after several rounds of discussions. Services use their own database and communicate through HTTP requests. They are released according to their own schedule to meet the market needs. But we face a new problem: how do we ensure the original business rule of bookings with all three services must be successful to make a trip plan? A service cannot just access another's database, because it violates the service boundary and services may not use the same database technology.

Service Boundary

Sagas

We found a great paper by Hector & Kenneth on the Internet talking about data consistency issues and this paper mentioned a terminology named Saga.

A saga refers to a long live transaction that can be broken into a collection of sub-transactions that can be interleaved in any way with other transactions. Each sub transactions in this case is a real transaction in the sense that it preserves database consistency. [1]

In case of our business, a trip transaction is a saga which consists of four sub-transactions: flight booking, car rental, hotel reservation, and payment.

Transactions

Saga is also described in Chris Richardson's article: Pattern: Saga.

Chris Richardson is an experienced software architect, author of POJOs in Action and the creator of the original CloudFoundry.com. [3]

Caitie McCaffrey also showed how she applied saga pattern in Halo 4 by Microsoft in her talk.

How Saga Works

The transactions in a saga are related to each other and should be executed as a (non-atomic) unit. Any partial executions of the saga are undesirable, and if they occur, must be compensated for. To amend partial executions, each saga transaction T1 should be provided with a compensating transaction C1.[1]

We define the following transactions and their corresponding compensations for our services according to the rule above:

Service Transaction Compensation
flight booking book flight cancel booking
car rental rent car cancel booking
hotel reservation reserve room cancel reservation
payment pay refund

Once compensating transactions C1, C2, …, Cn-1 are defined for saga T1, T2, …, Tn, then the system can make the following guarantee [1]

  • either the sequence T1, T2, …, Tn (which is the preferable one)
  • or the sequence T1, T2, …, Tj, Cj, …, C2, C1, for some 0 < j < n, will be executed

In another word, with the above defined transaction/compensation pairs, saga guarantees the following business rules are met:

  • all the bookings are either executed successfully, or cancelled if any of them fails
  • if the payment fails in the last step, all bookings are cancelled too

Saga Recovery

Two types of saga recovery were described in the original paper:

  • backward recovery compensates all completed transactions if one failed
  • forward recovery retries failed transactions, assuming every sub-transaction will eventually succeed

Apparently there is no need for compensating transactions in forward recovery, which is useful if sub-transactions will always succeed (eventually) or compensations are very hard or impossible for your business.

Theoretically compensating transactions shall never fail. However, in the distributed world, servers may go down, network can fail, and even data centers may go dark. What can we do in such situations? The last resort is to have fallbacks which may involve manual intervention.

The Restrictions of Saga

That looks promising. Are all long live transactions can be done this way? There are a few restrictions:

  • A saga only permits two levels of nesting, the top level saga and simple transactions [1]
  • At the outer level, full atomicity is not provided. That is, sagas may view the partial results of other sagas [1]
  • Each sub-transaction should be an independent atomic action [2]

In our case, flight booking, car rental, hotel reservation, and payment are naturally independent actions and transaction of each can be performed atomically with their corresponding databases.

We don't need atomicity at the trip transaction level either. One user may book the last seat on a flight which gets cancelled later due to insufficient balance in credit card. Another user may see the flight fully booked and one seat freed up for booking due to the cancellation. He/she can grab the last flight seat and complete the trip plan. It does not really matter to our business.

There are also things to consider on compensations.

The compensating transaction undoes, from a semantic point of view, any of the actions performed by Ti, but does not necessarily return the database to the state that existed when the execution of Ti began. (For instance, if a transaction fires a missile, it may not be possible to undo this action) [1]

However, this is not a problem at all for our business. And it is still possible to compensate for actions hard to undo. For example, a transaction sending an email can be compensated by sending another email explaining the problem.

Summary

Now we have a solution to tackle our data consistency issue with saga. It allows us to either successfully perform all transactions or compensate succeeded ones in case any fails. Although saga does not provide ACID guarantee, it still suits many scenarios where eventual data consistency is enough. How do we design a saga system? Let's address the question in our next blog post.

References

  1. Original Paper on Sagas by By Hector Garcia-Molina & Kenneth Salem
  2. Gifford, David K and James E Donahue, “Coordinating Independent Atomic Actions”, Proceedings of IEEE COMPCON, San Francisco, CA, February, 1985
  3. Chris Richardson: http://www.chrisrichardson.net/
  4. ServiceComb Saga Project: https://github.com/apache/incubator-servicecomb-saga

Tags:

Updated:

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...