Our service portfolio is increasing, and frustrated users can nowadays quickly move away to more reliable sites. Besides this commercial aspect, downtimes can also result in serious problems with a regulatory agency. I’ll give you now some simple steps you can use right away to conduct a high availability analysis followed by another post how to identify single point failure risks in your critical services.
A high available system continues it’s business function even if a sub service crashed. Any single point of failure (SPOF) risk is not acceptable in such a system. Therefore you should consider redundancy of critical components and fault tolerance of your core system. Usually, a high availability analysis starts with a comparison of expected and actual availability followed by a single point of failure identification.
The expected availability is often part of service specifications or contracts with service providers. Alternatively, you can use this formula to calculate it:
Availability = MTBF / (MTBF + MTTR)
Service Time: Hours service daily available for your users
MTTR: Mean time to repair
MTBF: Mean time between failure (Service Time – MTTR)
Let’s assume that our CRM application users are using the application 21.5 hours a day, 20 days a month and the monthly downtime is 4 hours. What is the availability of our CRM application?
Service Time: 25800 minutes per month
MTTR: 240 minutes per month
MTBF: 26800 – 240 = 25560 minutes per month
Availability: 25560 / ( 25560 + 240) = 99.07%
Our applications consist of many sub-services such as database, middleware, load balancer, network and external service providers. An outage of a load balanced server will eventually not have an impact on the reliability while a crash of a single database will often result in a downtime. The former is called availability in series and the latter is called availability in parallel.
Availability in series: A = Ax Ay
Availability in parallel: A = 1 (1 – Ax)²
Typically, the actual availability of a given system will be calculated in the following four steps:
- Prepare a block diagram of the system
- Develop a reliability model of the system
- Calculate the availability of sub-services
- Determine availability of the entire system
The block diagram will help you to visualize the essential components. Once you’ve created the visual representation of your services, you’ll derive the reliability model by identification of serial and parallel availabilities. Finally, you can calculate the availability of your sub services and the entire system.
Let’s continue with our given sample CRM application which consists of three sub-services and their service time is 21.5 hours. Sub-service 1 and Sub-service 3 have a MTTR of 240 minutes and Sub-service 2 has an MTTR of 480 minutes per month.
What is the actual availability of this CRM application?
Step 1: block diagram
Our CRM application consists of 3 sub services.
Step 2: Reliability Model
We have no loadbalancer or service redundancy in place – so its a availability in series.
Step 3: Availability per sub-service
S1: 25560 / ( 25560 + 240) = 99.07%
S2: 25320 / ( 25320 + 480) = 98.14%
S3: 25560 / ( 25560 + 240) = 99.07%
Step 4: System Availability
Availability: 99.07% * 98.14% * 99.07% = 96.32%
The example above demonstrates how to compare expected with actual availability. Keep in mind that there are measures such as proactive error detection and redundancy which improves availability and others such as complexity have an adverse impact on system reliability.
In my next post, I will outline a single point of failure analysis and provide a high availability calculation sheet which you can use right away.