Our service portfolio is increasing, and frustrated users can nowadays quickly move away to more reliable sites. Besides this commercial aspect, downtimes can also result in serious problems with a regulatory agency. I’ll give you now some simple steps you can use right away to conduct a high availability analysis followed by another post how to identify single point failure risks in your critical services.

A high available system continues it’s business function even if a sub service crashed. Any single point of failure (SPOF) risk is not acceptable in such a system. Therefore you should consider redundancy of critical components and fault tolerance of your core system. Usually, a high availability analysis starts with a comparison of expected and actual availability followed by a single point of failure identification.

Expected Availability

The expected availability is often part of service specifications or contracts with service providers. Alternatively, you can use this formula to calculate it:

Availability = MTBF / (MTBF + MTTR)

Service Time: Hours service daily available for your users

MTTR: Mean time to repair

MTBF: Mean time between failure (Service Time – MTTR)

Let’s assume that our CRM application users are using the application 21.5 hours a day, 20 days a month and the monthly downtime is 4 hours. What is the availability of our CRM application?

Service Time: 25800 minutes per month

MTTR: 240 minutes per month

MTBF: 26800 – 240 = 25560 minutes per month

Availability: 25560 / ( 25560 + 240) = 99.07%

Actual Availability

Our applications consist of many sub-services such as database, middleware, load balancer, network and external service providers. An outage of a load balanced server will eventually not have an impact on the reliability while a crash of a single database will often result in a downtime. The former is called availability in series and the latter is called availability in parallel.

Availability in series:    A = Ax Ay
Availability in parallel: A = 1 (1 – Ax)²

Typically, the actual availability of a given system will be calculated in the following four steps:

  1. Prepare a block diagram of the system
  2. Develop a reliability model of the system
  3. Calculate the availability of sub-services
  4. Determine availability of the entire system

The block diagram will help you to visualize the essential components. Once you’ve created the visual representation of your services, you’ll derive the reliability model by identification of serial and parallel availabilities. Finally, you can calculate the availability of your sub services and the entire system.

Let’s continue with our given sample CRM application which consists of three sub-services and their service time is 21.5 hours. Sub-service 1 and Sub-service 3 have a MTTR of 240 minutes and Sub-service 2 has an MTTR of 480 minutes per month.

What is the actual availability of this CRM application?

Step 1: block diagram

Our CRM application consists of 3 sub services.

blockdiagr

Step 2: Reliability Model

We have no loadbalancer or service redundancy in place – so its a availability in series.

blockdiagr

Step 3: Availability per sub-service

S1: 25560 / ( 25560 + 240) = 99.07%

S2: 25320 / ( 25320 + 480) = 98.14%

S3: 25560 / ( 25560 + 240) = 99.07%

Step 4: System Availability

Availability: 99.07% * 98.14% * 99.07% = 96.32%

The example above demonstrates how to compare expected with actual availability. Keep in mind that there are measures such as proactive error detection and redundancy which improves availability and others such as complexity have an adverse impact on system reliability.

In my next post, I will outline a single point of failure analysis and provide a high availability calculation sheet which you can use right away.

 

 

 

 

 

Advertisements

Posted by JM

Resourceful, solution-focused and intuitive reliability engineer with over 15 years of demonstrated success in architecting, developing and maintaining effective testing and monitoring solutions. Offers a wealth of knowledge and experience surrounding modern application architecture and development of best practices.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s