Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Author: Martin Kleppmann Year: 2017 Genre: Software Architecture
About This Book
Data is at the centre of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
Key Insights
- Workload first, then tools: Data models and storage engines should follow access patterns—there’s no one-size-fits-all.
- Storage engine trade-offs: B‑trees vs log‑structured engines, indexing, and compaction affect read/write behaviour and latency.
- Distributed data is a choice of compromises: Replication models, partitioning, and rebalancing shape consistency, availability, and performance.
- Consistency isn’t binary: From eventual consistency to linearizability and serialisability—know what guarantees you actually need.
- Consensus and coordination: When and why to use protocols like Raft/Paxos, and the costs they introduce.
- Batch, streams, and logs: Materialised views, idempotency, and log‑centric integration connect systems reliably.
- Evolving schemas and contracts: Backwards/forwards compatibility and migration patterns make change safe.
Why I Recommend It
It’s the clearest guide to the core trade‑offs in modern data systems. Kleppmann gives you the mental models to pick technologies deliberately, design reliable pipelines, and reason about consistency without hand‑waving.
If you own anything with queues, caches, databases, or streams, this book will pay for itself in better decisions and fewer outages.
