CAP Theorem in Blockchain
CAP theorem states that it is impossible for a distributed data store to simultaneously provide more than two out of the three guarantees: Consistency, Availability, and Partition. The article focuses on discussing CAP Theorem in Blockchain in detail.
The following topics will be discussed here:
- What is Blockchain?
- What is CAP Theorem?
- History of the CAP theorem
- The CAP in CAP Theorem
- CAP theorem NoSQL database types
- CAP Theorem applied
- CAP Theorem Examples
- Why does Blockchain Violate CAP Theorem?
- What is PACELC Theorem?
What is Blockchain?
Blockchain is a method of storing data that makes it challenging or impossible to alter, hack, or defraud the system. A hash, a sort of immutable cryptographic signature, is used to record transactions on a blockchain, a type of DLT.
- A blockchain is just a group of computer systems connected in a way that allows duplicate copies of each transaction to be kept and distributed throughout the network’s digital ledger.
- Every time a new transaction occurs on the blockchain, a record of it is added to each participant’s ledger, and each block on the chain consists of a number of transactions.
- The term “Distributed Ledger Technology” describes a decentralized database governed by many users (DLT).
What is the CAP Theorem in Blockchain?
CAP Theorem stands for Consistency, Availability, and Partition Tolerance. According to the theory, a distributed system cannot always ensure consistency, availability, and partition tolerance. When things go wrong, we must prioritize at most two distributed system features and trade-offs between them.
- CAP Theorem or Brewer’s theorem states that it is feasible to provide either consistency or availability—but not both—in the event of a network failure on a distributed database, a theory from theoretical computer science about distributed data stores. In other words, according to the CAP theorem, a distributed database system that experiences a partition must choose between Consistency and Availability.
- We must simultaneously communicate over the network and store data among several nodes in a distributed system. A distributed system frequently falls victim to network failures because of its reliance on network calls in a significant way. Tolerance for partitions is crucial. In this situation, we must decide, based on our needs, whether to prioritize consistency or availability.
- With blockchain technology, immediate consistency is frequently sacrificed for availability and partition tolerance. By requiring a specific amount of “confirmations,” blockchain consensus techniques are simply simplified to eventual consistency.
- Network failures can affect any distributed system, hence network partitioning is usually required. There are just two choices remaining in the event of a partition: consistency or availability. The system will return an error or time out if a specific piece of information cannot be guaranteed to be current owing to network segmentation when consistency is chosen above availability. The system will always process the query and attempt to return the most recent version of the data even if it cannot ensure that it is up to date because of network partitioning when availability is preferred over consistency.
- Blockchain is a decentralized database that manages a shared ledger that cannot be altered. Transactions are what make up the shared ledger. Consensus techniques are used to record transactions inside the shared ledger. Sharing distributed transactions naturally raises questions about the CAP theorem.
- Consistency is sacrificed in Blockchain due to the priority given to Availability and Partition Tolerance. In this case, Partition tolerance (P), Availability (A), and Consistency (C) on the blockchain are not attained simultaneously; instead, they are acquired over time.
History of CAP Theorem
- The theory initially surfaced in the fall of 1998, according to computer scientist Eric Brewer of the University of California, Berkeley. Brewer proposed it at the 2000 Symposium on Principles of Distributed Computing as a conjecture. It was first published as the CAP principle in 1999.
- Brewer’s conjecture became a theorem in 2002 after Seth Gilbert and Nancy Lynch of MIT gave a formal proof of it. The CAP theorem is also known as Brewer’s Theorem since Professor Eric A. Brewer introduced it in 2000 during a talk he gave on distributed computing. Professors Seth Gilbert and Nancy Lynch from MIT produced proof of “Brewer’s Conjecture” two years later.
- Brewer clarified some of his positions in 2012, explaining why the frequently used “two out of three” concept may be somewhat misleading because partition management and recovery techniques already exist, so system designers only need to sacrifice consistency or availability in the presence of partitions.
- Brewer also brought up how the definition of consistency in the CAP theorem and the ACID (Atomicity, Consistency, Isolation, Durability) definition is different. Since it was first proposed ten years ago, designers and academics have explored a wide range of unique distributed systems using the CAP theorem.
- It has also been used by the NoSQL movement to argue against conventional databases. The implementation of CAP achieved its goal of exposing designers to a greater variety of systems and tradeoffs; in fact, during the past 10 years, a wide variety of new systems have evolved along with a heated debate about the relative advantages of consistency and availability.
- As a result of its propensity to oversimplify the tensions between attributes, the “2 of 3” formulation was always deceptive. Such subtleties now matter. Only a small portion of the design area is prohibited by CAP: complete availability and consistency in the absence of partitions, which is uncommon.
CAP Theorem in Blockchain
As we know that CAP stands for Consistency, Availability, and Partition Tolerance so, let us understand Consistency, Availability, and Partition Tolerance with the help of some examples. Let us first understand availability as then it will be much easier to understand consistency and partition tolerance then.
Availability means that all clients who request data receive a response even if one or more nodes are down. In a distributed system, every operational node replies to each valid request made to it, to put it another way.
Example: Imagine you are the customer of a well-known vehicle company in your city because of the incredible deals and services it provides. In addition, they provide fantastic customer service, so you can contact them whenever you have questions or issues and get answers right away. The car company is able to connect every consumer who phones to one of its customer service representatives. Any information needed by the customer regarding his cars, such as the service date, the insurance plan, or other details, can be obtained. Because any customer can connect to the business or its operator and obtain information about the user or client, we refer to this as availability.
Consistency means that the duplicated data item will appear in the same copies on all nodes during different transactions. an assurance that each node in a distributed cluster returns the same, most recent, and successful writer. Every client’s perception of the data must be consistent to be considered consistent. Sequential consistency, which is a particularly powerful type of consistency, is referred to as consistency in CAP.
Example: Recently the insurance policy of your car got outdated and you want to update or get a new insurance policy for your car. You decide to call the bank or the insurance company and update it with them. When you call, you connect with an agent. This agent asks you for the relevant details of your previous policy. But once you have put down the phone, you realize that you missed one detail. So you frantically call the agent again. But, this time when you call, you connect with a different agent but then also, they are able to access your records as well and know that you are registering for your new insurance policy. They make the relevant changes in the house number and the rest of the address is the same as the one you told the last operator. We call this Consistency because even though you connect to a different customer care operator, they were able to retrieve the same information.
3. Partition Tolerance
A communication breakdown—a momentary delay or lost connection between nodes—is referred to as a partition in a distributed system. Partition tolerance describes the ability of a cluster to function even in the face of numerous communication failures between system nodes.
Example: Unfortunately, you need to sell your car because it’s old and outdated and you don’t use it very often. So you list all of the specifics about your automobile on a website for sale, and then you get in touch with a buyer. He starts negotiating because he wants to acquire your car and hence wants to complete the agreement process. However, because the bargaining was not mutual, neither of you could sign the agreement. Therefore, we might conclude that the agreement has been breached or that there is no partition tolerance in this situation.
CAP Theorem NoSQL Database Types
Consistency and high availability are not compatible with NoSQL. Eric Brewer initially stated this in his CAP Theorem. We can only accomplish two out of the three guarantees for a database, consistency, availability, and partition tolerance. It is crucial to comprehend the NoSQL database’s constraints. Consistency and high availability are not compatible with NoSQL.
It can be observed from the above diagram that Consistency and Availability are connected by a database CA, Availability and Partition Tolerance by a database AP, and Consistency and Partition Tolerance by a database CP. Let us discuss what CA, AP, and CP mean.
CA: CA database provides availability and consistency among all the nodes. However, it cannot accomplish this if there is a partition between any two system nodes, hence it is unable to provide fault tolerance. For eg: Applications used in banking and finance demand available and consistent data.
AP: AP database means that the system continues to operate even in the presence of node failures. AP-based systems compromise consistency and availability. Non- distributed databases like PostgreSQL uses AP-based database systems.
CP: CP database means that the system continues to operate even though network failures are occurring in the database. CP systems are strongly consistent but they are not properly available.
CAP Theorem Applied
1. Availability over consistency (A+P)
- When availability is preferred over consistency the system will always process the query and attempt to return the most recent version of the data even if it cannot ensure that it is up to date because of network partitioning.
- When accumulating data is a top priority, availability is crucial. Consider factors like user preferences or behavioral data in this situation. In situations like these, you’ll want to record as much data as you can about what a user or customer is doing, but it’s not necessary to keep the database updated all the time. Even when network connections are down, they must still be reachable and available.
- One more reason to adopt a NoSQL database that puts availability over consistency is the rising demand for offline application use.
The following picture clearly shows Availability over Consistency:
2. Consistency over availability (C+P)
- When discussing consistency and availability, a lot of technical jargon might be used, but the core idea is simple: consistency is required if the database’s data must always be current and aligned, even in the event of a network failure.
- When you need several clients to share the same view of the data, consistency is a particular use case where you should prioritize it. Use a database, for instance, that provides consistency and assurance that the data you are looking at is up to date in cases when the network is unreliable or malfunctions while dealing with financial or sensitive information.
- For example, Information about bank accounts must be consistent and accurate because it is sensitive information. Therefore, it is preferable to announce an outage if the most recent accurate information is not accessible than to provide the consumer with inaccurate information.
The following picture clearly shows Consistency over Availability:
CAP Theorem Examples
Example 1: A mobile phone has been designed in such a way that it has space for only one sim card which means no sharing.
Solution: This system guarantees Consistency, Availability, and Tolerance to Partitions.
Example 2: Immediately after sending a message to someone, that individual might not get it.
Solution: With this system, availability and partition tolerance are compromised without compromising consistency, or AP.
Example 3: When we build a form for a group of individuals, the others can only access it once we provide them permission to do so.
Solution: CP, or Consistency and Partition Tolerance without Compromising Availability, is ensured by this system.
The different database of CAP Theorem fits in different software:
- AP: Dynamo, Cassandra, Elastic Search, CouchDB, Riak, MongoDB.
- CP: HBase, MongoDB, Redis, Memcached.
- CA: Postgres, MySQL.
Why does Blockchain Violate CAP Theorem?
- Blockchain obviously violates the CAP theorem. As discussed above both partition tolerance and availability are “income-producing” characteristics.
- If the blockchain system is unavailable, businesses that use it will begin to lose money. In other words, it’s critical to record new transactions on a node in the blockchain system whenever they are submitted, such as when money is transferred from one business to another.
- In the absence of blockchain, the new transaction is lost. This is the reason why Blockchain violates the CAP theorem.
What is PACELC Theorem?
- PACELC Theorem asserts that “otherwise (E), even when the system is functioning normally in the absence of partitions, one has to pick between latency (L) and consistency (C).”
- The PACELC Theorem is an extension of the CAP Theorem and is based on it. The CAP Theorem has the drawbacks of not having three options and of making it difficult to discern between CA and CP. They are also employed as an excuse for ignoring consistency.
- Due to its disregard for latency and performance, the CAP theorem has a serious weakness. PACELC was created as a result. By showing that systems might have a tendency toward either strong consistency or latency sensitivity, the PACELC theorem broadens the scope of the CAP theorem paradigm (pattern).