Availability

what I learned 

Availability is really important. It is the odds of a service being available at any given time measured in percentage. It is implied that a customer expects a high level of availability when they are paying for something. It's especially important to think about availability when designing critical systems like life supporting hospital software or even systems that are far reaching and widely consumed such as cloud services.

In the industry, since low availability such as 85% is unacceptable, we measure it in nines. If you have 99% availability, then we say it has 2 nine availability. If we have 99.99%, then we have 4 nine availability. We usually care about the downtime per year. For example, 2 nines equates to 3.65 days of downtime per year. 3.65 days is still pretty bad, imagine if youtube was down this much per year. We usually regard 5 nines as the gold standard in the industry which equates to 5 minutes of downtime per year.



1) SLA(service level agreement) - explicit written contract between customer and client about the availability of a service. For example, every cloud service has a SLA.





2) what parts of our system require high availability? This is something we need to think hard about when designing our system. For example, payment services are critical to the business and always need high availability. Something like a customer information page may not need to be so highly available.

3) How to make our system highly available? In a nutshell, we want to make sure we don't have single points of failure by using redundancy. For example, we might want to add more servers, more load balancers. We can easily add redundancy just by adding more machines.


Comments

Popular posts from this blog

Lifecycle of React components

Styled Components

e-commerce website built with React.nodejs