


Use proper monitoring that shows early warnings signals of faults.Īs system grows, there should be reasonable ways for dealing with that growth.Make it fast to roll back configuration changes, and provide tools to re-compute data.Test throughly at all levels, from unit tests, to whole system integration tests.Provide fully featured sandbox environments with real data for testing, without affecting real users.Design abstractions that are minimal and easy to achieve one thing with, but not too restrictive for people to work around them.Some approaches for making reliable systems, in spite of unreliable human actions include: There is no quick solution to the problem, but the software can constantly check itself while running for discrepancy. The reason behind software faults is making some kind of assumptions about the environment, this assumptions are usually true, until the moment they are not. It was sufficient for a long time, but as computing demand increase, there is a move toward systems that can tolerate the loss of entire machines by using software fault tolerance as well. Hardware redundancy is the first line of defense against hardware faults. It's impossible to prevent faults, but we should try to prevent faults from causing failures by designing fault-tolerance mechanisms. System should continue to work correctly, even in the face of faults and human errors.Ī fault is a one component of the system deviating from its specs, while failure is the when the system as a whole stops working. There are many technology options out there, and our task is to figure out the most appropriate tools and approaches for the task that is to ensure that the data remains correct and complete, provide good performance, and scale to handle load increase, despite any internal failures, or system degradations. Part I: Foundation of Data Systems Chapter 1: Reliable, Scalable, and Maintainable ApplicationsĬPU power is rarely the limiting factor anymore, it is the data size that is. Chapter 8: The Trouble with Distributed Systems.Chapter 2: Data Models and Query Languages.Chapter 1: Reliable, Scalable, and Maintainable Applications.However, a fair amount of how it works explanation details are included. The main goal behind it is to be a quick one page look-up for people wishing to remember some of details on the fly, or for someone who wish to recap the highlights of the whole book in less than an hour. This reading notes are biased towards what to do rather than how it works. Designing Data Intensive Applications NotesĪlthough being one of the most important books for the software industry, as it bridges the gap between distributed systems theory and practical engineering, I struggled to find a good summarized reading notes that covers up all the key points of the book, so here it is, I hope.
