Being able to recover from a major system failure is essential for most businesses today. The key to recovery, beyond implementing technology that supports disaster recovery, is to practice for disasters. That means periodically meeting with members of the organization to review the process an IT department will go through in order to, step by step, bring critical systems back from an uncontrolled virus attack, hurricane, or major hardware failure. The table top exercise described below is a relatively inexpensive way for an organization to discuss scenarios for systems recovery and identify issues in the current disaster recovery plan.
The exercise will be most effective if ahead of time a network diagram is created that provides the following details for the existing information system:
- Local network geographic locations
- Workstations, network switches, routers, servers, phone systems, and related equipment
- Local and Wide area network connections, speeds, and carriers
- Existing contracts for business continuity services
- Matrix of existing IT staff and areas of expertise
Where possible, the network diagram should identify the high level systems in place in the organization, and degree of criticality of each system to continued business operations in the event of a disaster. The criticality of a system determines its priority in a recovery effort.
In addition, if the organization has a policy on systems availability or the operations of the organization during particular disasters, this policy should be available to staff in the meeting. This policy may identify disasters where the organization will continue to operate, the chain of command to respond to system disasters, and organizational expectations for recovery time objectives.
With the network diagram, the IT/IS department should meet with its leadership and senior staff, and some representatives from the business units served by the information systems for approximately two hours. A member of the meeting should be designated to take notes on the course of the meeting, and a different member should be designated as moderator. Notes taken should be organized by disaster, and should identify which staff resources were available during each scenario, potential issues with recovery that were identified during the discussion, and open questions about configuration or recoverability.
During the meeting, the following activities should take place:
1. A disaster scenario for discussion should be identified which will impact business operations.
2. Members of the team should identify what systems will be unavailable, the estimated length of the interruption to service, what IT resources will be available to respond to the disaster, and what impact the unavailable systems will have on continued business operations.
3. Members of the team should then work through the stated policy on Disaster Recovery for the organization, and identify the operational and technical steps required to recover the affected system or systems.
4. The group should come prepared to discuss experience with similar recoveries, and identify potential issues with performing a timely recovery of data. Relevant to the discussion is the fact that such a recovery has not been previously tested, or other known resource limitations on performing a recovery based on the scenario.
5. A member of the team should be identified as the note keeper for the discussion, and will be responsible for distributing notes of the meeting to all participants.
The process above should be repeated so that the group can address two to three disaster scenarios during the meeting. The total length of the meeting should be limited to two hours.
Following the meeting, a list of issues to be addressed should be created which should form the basis for a project or work plan. Where necessary, a budget for capital and/or ongoing expenses and resources should be developed based on the work plan. Research may need to be conducted on potential technical solutions or workarounds to identified issues during the exercise. In addition, the policies of the organization may need to be reviewed or modified in order to reflect actual practice in responding to disasters, or based on feedback from the team meeting. A technical testing plan may also be required based on the findings or proposed technical solutions to a systems failure.
A follow-up table top exercise should then be scheduled, based on the estimated time required to be able to appropriately address the issues identified in the previous table top exercise. Results from successive table top exercises can be used to demonstrate progress in preparedness for disasters.
For more details or information on how to conduct a table top exercise, see The National Institutes of Standards and Technology, special publication 800-84, section 4-1. Need help organizing or facilitating a table top exercise? Contact us for more information.