Distributed Transactions - Part 1

Nov 1, 2024 · Manthan Gupta

gif

Ah shit, another blog series from yours truly. It is probably going to be a 3-part blog series where we talk about transactions in a distributed setting. In this part, we will discuss transactions, the meaning of ACID, and Single-Object and Multi-Object operations.

What are transactions?

A transaction is a logical unit that groups several reads and writes. All the reads and writes in a transaction are executed as one operation. Either the entire transaction succeeds or fails. If it fails, it is safe to retry, as all the changes have been reverted inside the transaction.

What is ACID?

Transaction safety guarantees are often described by the acronym ACID (Atomicity, Consistency, Isolation, and Durability). Systems that do not fulfill ACID compliancy are sometimes called BASE (Basically Available, Soft State, and Eventual Consistency). But right now we are only going to discuss ACID.

Atomicity
Atomicity describes what happens if a client wants to make several writes, but an error occurs after some writes have been processed. Some examples of errors can be a process crash, network interruption, disk is full, integrity constraints violated, etc. If the writes are grouped in an atomic transaction, and the transaction fails for some reason then the transaction is aborted, and the writes that have been processed till now are discarded. If a transaction was aborted, then it is safe to retry as no changes have been applied to the database.
Consistency
The idea of consistency is that you have certain statements about your data that should hold true always. The notion of consistency depends on the application’s defined invariants, and it’s the responsibility of the application to define its transactions correctly and preserve consistency. In ACID, consistency is the property of the application whereas atomicity, isolation, and durability are the properties of the database.
Isolation
Isolation defines that multiple transactions executing concurrently are isolated from each other. Each transaction can pretend that it is the only transaction running on the entire database. The database ensures when the transactions have been committed, the results are the same as if they had run serially.
Durability
Durability is a promise that once a transaction is committed successfully, any data that has been written will not be forgotten, even if there is a hardware fault or the database crashes. In a single-node setup, durability means that the data has been written to non-volatile storage such as HDD or SDD. It also involves writing to WAL for recovery. In distributed settings, durability means that the data has been successfully copied to some nodes. The database must wait for these nodes to confirm that the transaction has been committed successfully before reporting success.

Single-Object and Multi-Object Operations

Single-Object Writes

Storage engines universally aim to provide atomicity and isolation on the level of a single object on one node. Atomicity can be implemented using a log for crash recovery. Isolation can be implemented using a lock on each object. Allowing only one thread to access an object at one time. Single-object operations are useful, as they can prevent lost updates when several clients try to write to the same object concurrently. They aren’t exactly transactions in the usual sense of the word.

Transaction is usually understood as a mechanism for grouping multiple operations on multiple objects into one unit of execution. Compare-and-set allows a write to happen only if the value hasn’t been changed concurrently by some other operation. Compare-and-set and other single-object operations are usually known as lightweight transactions.

Multi-Object Transactions

Many distributed data stores have abandoned multi-object transactions because they are difficult to implement across partitions and they can get in the way of performance and high availability. As compared to single-object operation multi-object operates on multiple objects (possibly stored on different nodes).

In a relational data model, a row in one table has a foreign key reference to a row in another table. Similarly, in a graph-like data model, a vertex has an edge to other vertices. The multi-object transaction allows you to ensure that these references remain valid.

Handling errors and aborts

A key feature of a transaction is that it can be aborted and safely retried if an error occurs. ACID databases are based on the idea that if it’s in danger of violating its guarantee of atomicity, consistency, isolation, or durability then it would rather abandon the transaction entirely.

If the transaction succeeded but the network failed while the server tried to acknowledge the successful commit to the client, then retrying the transaction causes duplication unless there is a de-duplication mechanism in place.

Retrying the transaction will make the problem worse if the error is due to overload. In this case, the advice is to use exponential backoff.

If the transaction has side effects outside the database, like sending an email, that may happen even if the transaction is aborted.

Parting Words

This is it for part 1 of the series. In the next one, we will discuss weak isolation levels that you all should find interesting. I will meet you in the next one!

If you liked the blog, then you can support it by buying me a coffee here, sharing on your socials, and following me on X/ Twitter.

References

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann