What is Scalability and How to achieve it – Learn System Design
Previous Parts of this System Design Tutorial:
- What is System Design
- Analysis of Monolithic and Distributed Systems
- Important Key Concepts and Terminologies
Now that we have covered the basics of System Design, it is now time to dive deep into its features and components. One such very important concept in System Design is Scalability.
In system design, Scalability is the capacity of a system to adapt its performance and cost to the new changes in application and system processing demands.
The architecture used to build services, networks, and processes is scalable under these 2 conditions:
- Add resources easily when demand/workload increases.
- Remove resource easily when demand /workload decreases.
Scalability is basically a measure of how well the system will respond to the addition and omission of resources to meet our requirements. That is why we do a Requirement Analysis of the System in the first phase of SDLC and make sure the system is adaptable and scalable.
How to achieve Scalability?
Now scalability is achieved via two methods in systems:
- Vertical scaling
- Horizontal scaling
Now let us discuss above two methods of scaling up systems as mentioned and showcased above to a greater order degree of depth that is as follows:
What is Vertical Scaling?
By adding more configuration or hardware for better computing or storage, vertical scaling expands the scale of a system. In actuality, this would include upgrading the processors, raising the RAM, or making other power-increasing changes. Multi-core scaling is used in this case to scale by distributing the load among the CPU and RAM resources.
Pros Of Scaling Up Vertically
- It uses less energy than maintaining multiple servers.
- Requires less administrative work because only one machine must be managed.
- Has lower cooling costs.
- Lower software costs.
- Simpler to implement.
- Preserves application compatibility.
Cons Of Scaling Up Vertically
- There is a high chance of hardware failure, which could result in more serious issues.
- There is little room for system upgrades, it may become a single point of failure (SPOF)
- There is a limit to how much RAM.
- Memory storage can be added to a machine at once.
What is Horizontal Scaling?
Through the act of adding new machines, a system can be scaled horizontally. Several devices must be gathered and connected in order to handle more system requests.
Examples: Cassandra and MongoDB
Pros Of Scaling Up Horizontal
- It is less expensive than scaling up and makes use of smaller systems.
- Simple to upgrade.
- The existence of discrete, multiple systems improves resilience.
- Fault tolerance is simple to manage.
- The capacity is increased by supporting linear.
Cons Of Scaling Up Horizontal
- The license costs are higher.
- It has a larger footprint inside the data center which increases the cost of utilities like cooling and energy.
- It necessitates more networking hardware.
Remember: Scalable code is generally computation inefficient. It is bitter truth because we split big and complex code to a set of small associative operations so as to scale up horizontally because vertical scaling has a limit.
Now let us conclude both types of scaling represented below in a tabular format as follows:
Vertical Scaling vs. Horizontal Scaling
Now that we have looked into the details of each type of scaling, let us compare them with respect to different parameters:
|Parameter||Horizontal Scaling||Vertical Scaling|
|Database||Partitioning of data.||Data resides on a single machine and scaling is done across multicores henceforth the load is divided between CPU and RAM.|
|Downtime||Adding machines in a pooled results in lesser downtime.||Calling over a single machine increases downtime.|
|Data Sharing||As there is distributed network structure so data sharing via message passing becomes quite complex||Working over a single machine enables message passing making data sharing very easier.|
How to avoid failure during Scalability?
As studied above with the concept of scalability we can while designing the architect of a system we can not opt to design on extreme sides that are either overusing(more number of resources) the resources or underusing (lesser number of resources) the resources per the requirements gathered and analyzed.
Now there is a catch here even if we can design a perpetual perfect system then too there arises a failure(as discussed above in Architect Principle Rules for Designing). Failures do exist for sure as mentioned above in the best-designed system but we can prevent them from hampering our system globally. This is because we keep our system redundant and our data replicated so that it is retained.
Let us now understand these terms in greater depth which are as follows:
What is Redundancy?
Redundancy is nothing more than the duplication of nodes or components so that, in the event of a node or component failure, the backup node can continue to provide services to consumers. In order to sustain availability, failure recovery, or failure management, redundancy is helpful. The goal of redundancy is to create quick, effective, and accessible backup channels.
It is of two types:
- Active redundancy
- Standby or Passive redundancy
What is Replication?
Replication is the administration of various data storage in which each component is kept in numerous copies hosted on different servers. It is simply the copying of data between many devices. It involves synchronizing various machines. Replication contributes to increased fault tolerance and reliability by ensuring consistency amongst redundant resources.
Also, it is of two types:
- Active replication
- Passive replication
Now let us, deep dive, into studying concepts of load balancing and hashing from scratch after having pre-requisite knowledge of load balancing and caching.
Tip: Till now we have achieved storing data efficiently by keeping check for efficiently storing data in servers via replication. But if we look carefully till now we only emphasis scaling systems properly but these systems are highly inefficient.
It is because we are only scaling systems keeping in check for node failure arising to Single Point Of Failure in databases (SPOF causes damage to system archietect at local and global scale) through redundancy, but not looking out for bounds as per scaling such as:
- Increased latency as per scalability
- Lesser throughput
Scalability Design Principles
Whenever a system is designed, the following principles should be kept in mind to tackle scalability issues:
- Scalability vs Performance: While building a Scalable system, the performance of the system should be always directly proportional to its scalability. It means that when the system is scaled up, the performance should enhance, and when the performance requirements are low, the system should be scaled down.
- Asynchronous Communication: The should be always asynchronous communication between various components of the systems, to avoid any failure.
- Concurrency: It is the same concept just likely programming, here in the system if our controller needs multiple queries to send to the user then they are launched concurrently which drastically cuts(reduces) the response time.
- Databases: If the Queries are fired one after another, the overall latency should not increase and the database should not start sweating.
- Eventual Consistency: Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
- Denormalization: 3rd normal form depicts computations are more expensive than HDD space which just not only bind us to electricity but also to higher latency.
- Caching: It acts as an important pillar in getting cache hits/misses and LRU cache.
- Failures: Everything in a system can never be kept under control as failures occur when we are making the system perform threshold. But failures do occurs. Here we practice using this method to isolate issues and prevent them from spreading globally.
- Monitoring: Some bugs do occur while in the reproduction phase which is the worst phase as here we are not having adequate evidence behind occurrences of logic so with help of monitoring we indirectly are constantly retrospecting the incidents.
- Capacity balancing: Suppose the load increases tremendously and we receive 1000 requests earlier being managed by 20 workers with an average request time say it be 100ms.
Now here, circuit breakout = 1000/20 = 500ms. It means 900 requests will fail. That’s why sometimes we adjust circuit breaker settings in order to balance capacities in the real world.
- Servers: Small-capacity servers are good for curves with smooth capacity whereas big servers are action for heavy computations invoking monitoring and latency, load balancing.
- Deployment: An older code should always be present and maintained for all massive and irreversible changes that result in downtime. If not possible then break the change apart but these practices must be followed while deploying when system architecture is scaling.
How to handle SPOF during Scalability?
In order to make efficient systems scalable from replication parameter where there are multiple copies stored over servers handling SPOF well, here we need to learn 2 concepts listed below that aids us in achieving efficient scalable systems globally even across huge distributed system architect.
- Load Balancing
Now let us cover load balancing to a greater order degree of depth followed by caching to completely understand scalabilty to higher order degree which is as follows:
What is Load Balancing?
Load balancing It is a technique of effectively distributing application or network traffic among all nodes in a distributed system. Load balancers are the tools used to ensure load balancing.
Load Balancer roles
- Each node receives an equal share of the workload.
- Should keep track of which nodes are unavailable or not in use.
- Effectively manage/distribute work to ensure that it is finished on time.
- Distribution should be done to maximize speed and use all available capacity.
- Load balancers must guarantee high scaling, high throughput, and high availability
Let us also make it clear what should be the ideal conditions under which we can use a load balancer. They are as follows:
- Load balancers can be used for load management when the application has several instances or servers.
- Application traffic is split amongst several servers or nodes.
- Load balancers are crucial for maintaining scalability, availability, and latency in a heavy-traffic environment.
Benefits of Load Balancing
- Optimization: In a heavy traffic environment, load balancers help to better utilize resources and reduce response times, which optimizes the system.
- Improved User Experience: Load balancers assist in lowering latency and raising availability, resulting in a smooth and error-free user request.
- Prevents Downtime: By keeping track of servers that aren’t working and allocating traffic properly, load balancers provide security and avoid downtime, which also boosts revenue and productivity.
- Flexibility: To ensure efficiency, load balancers can reroute traffic in the event of a failure and work on server maintenance.
- Scalability: Load balancers can use real or virtual servers to deliver responses without any interruption when a web application’s traffic suddenly surges.
Now geeks you must be wondering out are there any pitfalls associated with load balancing.
Challenges to Load Balancing
As we already have discussed a constraint of SPOF while developing systems so the same is incorporated here out. Load balancer failure or breakdown may cause the entire system to be suspended and unavailable for a while, which will negatively affect user experience. Client and server communication would be disrupted in the event of a load balancer malfunction. We can employ redundancy to resolve this problem. Both an active and a passive load balancer may be present in the system. The passive load balancer can take over as the active load balancer if the active load balancer fails.
For better understanding, we will dive to Load Balancing algorithms that are as follows:
Load Balancing Algorithms
For the effective distribution of load over various nodes or servers, various algorithms can be used. Depending on the kind of application the load balancer must be utilized for, the algorithm should be chosen.
A few load-balancing algorithms are listed below:
- Round Robin Algorithm
- Weighted Round Robin Algorithm
- IP Hash Algorithm
- Least Connection Algorithm
- Least Response Time
Now we are done with adequate knowledge of load balancing to dive into caching.
What is Caching?
A cache is a portion of data that is generally temporarily cached by a high-speed data storage layer allowing subsequent requests for that data to be fulfilled more quickly than if the data were accessed directly from its original storage location.
Caching is a process by which we can reuse data that has already been swiftly accessed or computed by producing a local instance of the static data, caching reduces the number of read calls, API calls, and network I/O calls.
Types Of Cache:
There are basically three types of caches as follows:
- Local cache: In memory when the cache must be kept in the local memory, it is used for a single system. It is also known as L1 cache.
- Example: Memcache and Google Guava Cache
- External cache: Within multiple systems also known as a distributed cache. It is also known as L2 cache.
- When the cache must be shared by several systems, it is employed. As a result, we store the cache in a distributed manner that all servers may access.
- Example: Redis
- Specialized cache: It is a special type of memory that is developed for improving the performance of the above local and external cache. It is also known as L3 cache.
How does Caching work?
The information stored in a cache is typically saved in hardware that provides quick access, such as RAM (random-access memory), but it can also be used by a software component. The main objective is to increase data retrieval performance by avoiding contact with the slower storage layer below.
Note: Applications of caching are:
- CDN (Content Delivery Network)
- Application Server Cache
Benefits of Caching
- Improves performance of the application
- Lower database expenses
- Lessen the Backend’s Load
- Dependable Results
- Get rid of hotspots in databases
- Boost read-through rate (IOPS)
Disadvantages of Caching
- Cache memory is costly and has a finite amount of space.
- The page becomes hefty as information is stored in the cache.
- Sometimes updated information is not displayed as the cache is not updated.
Application of Caching
- Caching could help reduce latency and increase IOPS for many read-intensive application workloads, including gaming, media sharing, social networking, and Q&A portals.
- Compute-intensive applications that change data sets, such as recommendation engines and simulations for high-performance computing, benefit from an in-memory data layer acting as a cache.
- In these applications, massive data sets must be retrieved in real-time across clusters of servers that can include hundreds of nodes. Due to the speed of the underlying hardware, many programs are severely constrained in their ability to manipulate this data in a disk-based store.
Remember: When and where to use caching?
Case1: Static Data: If the data is not changing too regularly, caching would be beneficial. We can save the data and use it right away. Caching wouldn’t do much good if the data was changing quickly.
Case2: Application type: Applications can either be read-intensive or write-intensive. The application that requires a lot of reading would benefit more from caching. Data would change quickly for a write-intensive application, hence caching shouldn’t be used.
Lastly, let us discuss caching strategies to wrap up the concept of caching:
Caching patterns are what designers use to include a cache into a system. Write-through and cache-aside are two common techniques:
Cache Eviction Strategies
The eviction policy of the cache determines the order in which items are removed from a full cache. It’s done to clear some room so that more entries can be added. These policies are listed below as follows:
- LRU(Least Recently Used)
- LFU(Least Frequent Used)
- FIFO(First In First Out)
- LIFO(Last in First Out)