Thursday 7 February 2008

What does High Availability mean?

The term High Availability was coined many years ago.

It was coined when uptime was defined in percentages. If a server is up all the time, then it is said to be 100% available. That is, it is always available to its users all the time. This is only possible with very specialist hardware and software. If it is down, say 1 hour a month for maintenance, then, looking at 24x365 = 8760 hours per year, it is down for 12 of those. In percentage terms, 12 hours out of 8760 is 0.137%, it is up for 99.86%. Or, available for 99.86% of the time.

Often, people talk about three nines or four nines availability, this means 99.9% or 99.99% availability. There was a time when High Availability was defined in terms of percentage of uptime, if a system had greater than 99.9% availability, it was classed as highly available, and if it had greater than 99.99% availability it was fault resilient.

Today, I would say that there is no real ‘definition’ of high availability, the term is used to mean that a system is resilient and all single points of failure (SPOF) have been eliminated. Its failure modes are known and well defined, including networks and applications. Some of this resilience is achieved through doubling up on hardware such as network cards, power supplies etc, and some is achieved through software such as LifeKeeper.


I would say that today, the term means that users have their systems available for their use when they need it. It requires the administrator to think carefully about downtime, and plan for system recovery, should a failure occur. In other words, to plan for failure of each critical component, have a recovery scenario ready and know in advance the amount of time the systems is likely to be down for. It also means that this recovery time is acceptable to your users.