Based on my experience a high availability analysis (HA) will help you to identify weaknesses in your IT services. Conduct it regularly because small changes in your landscape can have an enormous impact on the reliability of your business services. Once you’ve identified a gap between expected and actual availability I highly recommend executing a single point of failure analysis (SPOF).
I will give you now some simple steps which you can use right away followed by a complete HA and SPOF analysis example.
An outage of sub-services will have an impact on the entire system reliability.
Therefore, verify now all your sub-services regarding:
- expected failures
- preventive measures
- business impact of an outage
If there is no preventive measure available, you’ve identified a single point of failure. Try to eliminate any SPOF as soon as possible to avoid critical outages. Good failure detection and redundancy will help you to improve the reliability of your business critical application.
Sample SPOF Analysis
As mentioned above, we use the single point of failure analysis to verify our services in terms of potential errors, mitigations and the business impact of outages.
Our sample application used for this analysis consists of three components. The table below contains the result of this SPOF analysis for our sample application.
We’ve identified a single point of failure because there is no mitigation for a hardware failures available.
I’ve prepared a high availability and single point of failure calculation matrix which you can use right away.