Several readers have responded to a previous article in which I recommended powering down computer rooms to prepare for inevitable emergencies. The respondents stated that they could not power down their systems due to either 24/7/365 or 99.999 percent availability requirements (often referred to as "the five nines").
Without full testing one can never be 99.999 percent certain that systems are prepared for an incident. With this in mind, users may appreciate a scheduled maintenance window (excluded from the up time meter). Here are a few points to consider:
If
the services provided by a computer room require extremely
high availability, then it stands to reason that the computer
room itself should not be a single point of failure.
Note that system maintenance is critical to meeting uptime
requirements. Paralysis caused by unnecessary restrictions
may ultimately result in downtime.
High availability solutions are expensive. It is prudent
to know which business processes are being supported.
In the course of routine operations it is difficult to truly
quantify the needs of the user community without conducting
research. To determine actual needs, be sure to document
the following:
A functional description of each application
Users of the application (e.g., customer service reps, loan
officers, customers, etc.)
Time periods when users access the application
Business criticality of the application and its respective
systems
The financial impact of two to three hours downtime per
application
Primary and alternate contacts representing the application's
user community
The redundant infrastructure components supporting the application
(e.g., mirrored disk, clustering)
Whether or not the infrastructure meets the requirements
of the application
The redundant components of the data center (e.g., UPS,
generators, separate power grids, communications lines from
separate central offices)
Whether or not the data center fulfills the requirements
of the applications
Do not rely solely on verbal responses. Verify utilization
through system and application logs. Create a matrix representing
users' needs by application. Usage gaps begin to emerge
into maintenance windows. In this stage, the need for a
24/7/365 computer room may still exist.
Throughout this process an intimate understanding of the user community's needs are gained. One common argument is "We will lose $x million in revenue." Question how that figure was generated. Is it reasonable to assume that users will not return if they are greeted by a maintenance Web page in the middle of the night?
Loss of revenue starts to build a business case for a redundant or resilient solution. If the business cannot bear two to three hours of downtime once a year, determine the best solution to meet those requirements. If only a few applications must be highly available, it may be possible to replicate them to a hot site instead of keeping the entire computer room running 24/7/365. Hot sites need not be an economic burden if a remote company-owned computer room is used. Hot sites also offer obvious disaster recovery benefits. Advise senior management of your findings so that they can make an informed decision.
Once
appropriate redundancy/resiliency is in place, there should
be no issues with an annual computer room shutdown. If for
some reason the computer room still cannot be powered down,
this exercise will have created a better understanding of
the user community's needs and made it easier to take down
individual systems for maintenance.
Copyright © 2005 CyberGuard Corporation All Rights Reserved.
Reprinted with Permission