The term disaster recovery is relative because disaster has many forms and will occur unexpectedly. IT systems, servers, data and applications are vulnerable to disasters like floods, hurricane, power outage, hardware failure, attacks and hacks on servers or network and also to human error. Inadequate planning in technology and business can compromise critical business data and cause substantial financial downfall. Most of the disaster issues can be mitigated by developing comprehensive DR plans and restore business operations within few hours, instead of days as earlier. DR plans are aimed at protecting data and vital assets in the organization. It is important to note that DR planning is unique for every organization.
Consider a scenario when important notices or information must be sent to specific set of people at a given time. If IT is down, the cost of downtime can result in the company losing its value with customers, stocks decline and stakeholder confidence is lost. Therefore, DR plans in an organization should identify potential risks that exist within their IT environment and define steps to mitigate those risks in order to ensure business continuity.
Business continuity and DR planning is often defined as an ongoing process that is integrated with day to day operations. For example if a C-level executive will notice his email is down or is unable to generate a report for decision making, this cannot be tolerated. The process involved in a DR plan includes certain key elements to ensure efficient and effective restoration of critical business functions in the event of an unplanned disruption. The key elements to consider in BC and DR planning includes,
- Assessment of critical applications
- Procedures for Back-Up and Restore, recovery of data
- Procedures for implementation and testing and maintenance
Some of the best practices which can be considered while defining DR plans are:
- Catalogue systems and identify all impact: BCDR planning starts with identifying applications and data for their criticality and their cost of downtime. Also, the recovery points and recovery time objectives (RTO) for each component is understood. For example, the negative impact of losing critical customer records or network connectivity must be understood and planned appropriately. The plan must include all the systems and services (servers, storage disks, network components, etc.) participating in business operations must be catalogued and the RTO for each component is understood and plans are defined.
- Involve people in BCDR: Normally, DR responsibility falls with IT or a single person in the company. If this person is not available during a disaster event, the company has all the recovery plans, but no one to restore the systems. Therefore, several people from different departments can be trained to handle recovery procedures, it will be best to train some people outside the primary data center region or another team of people in another region are taught on applications and data recovery, particularly in a hosted environment.
- Ensure redundancies to protect mission critical data: The plans will ensure that adequate resources are available for backing up data and applications. In case of availing cloud services from data centers, the company must ensure appropriate SLAs in place for disaster recovery to make sure applications and data are available at all times. SLAs must also define infrastructure redundancy like replicating data in another location for availability in case if the primary data center is down for some reason. Infrastructure redundancies to consider in plans will include power, cooling, telecom, network and other related hardware.
- Include changes to DR Plan as and when they occur: BCDR plans must include all critical business processes and their associated applications, their SLAs, data sources and steps to recover within their recovery points and RTO. For example, if a new application is implemented, or if applications and data are moved physically to a cloud, the earlier plans are outdated. It will be best to keep plans aligned with changes in the operating environment and documented fully.
- Periodically evaluate BCDR Plans: BCDR plans are evaluated periodically by running vulnerability tests and appropriate patching is done for protection to systems and infrastructure. Latest technologies are also evaluated for their benefits in ensuring business continuity and to develop a new set of SLAs. The evaluation and testing of plans must consider future requirements and ensure a fail-safe infrastructure for the organization.
In spite of diligent planning and continuous evaluation of BCDR, IT failures are common and hence DR is a continuous process. Companies can consider these best practices in BCDR planning to overcome IT disaster nightmares.