Disaster Recovery

Checklist

What is a disaster?
Define disasters in IT
Local IT, Network/Power outage, ISP/DNS Failure
What is disaster recovery
Steps to do for disaster recovery
Scales in terms of sites and infrastructure for disaster recovery
Diff between disaster recovery and backup recovery solutions
Maintenance of the DR setup

Disaster can mean different to different people and based on each scenario. In common, I can define Disaster as a loss of an asset, either on an economical or emotional scale, a loss that cannot be redone or repaired very easily. Fixing something that is destroyed by a disaster is not an easy process and cannot be done in a short span. Forgetting for wedding anniversary and sharing your phone with your girlfriend without deleting the messages are common emotional disasters. Economical disasters need not be explained and I pray that it should not be experienced.

The term "disaster" in IT is similar to its actual meaning but the factors affected have a direct impact to the business, reputation and brand of the company owning the infrastructure.

Disasters are not only natural events but also include man-made disasters and they cannot be avoided in most scenarios by IT Organizations. Such disasters can cause an outage for service and can be detrimental for business, the product or service. The IT Wing in every organization with a digital footprint do not welcome any disaster as it could mean partial or complete doom for the company in such cases.

To list a few events that could be considered a disaster for an IT Infrastructure - Natural Disasters

Plans are made to mitigate such disasters and avoid a complete loss and/or operations deadlock and sustain the company through and out of the event. Simple events can also lead to an outage for the company and could lead to loss of revenue, competitive position and customer inconvenience.

Disaster Recovery (DR) is the recovery plan determined to encounter any type of disaster that may impact the IT infrastructure. As a part of DR, we tend to look at 4 R's -

Disaster Recovery is a test for the complete IT Infrastructure and defines the IT standpoint and maturity of an organization in critical times. Planning for disaster recovery requires knowledge of the current design and architecture of the company and the IT footprint. It is safe to say that when a disaster strikes, it does not choose selectively but wipes away anything in its path. Hence, planning for disaster recovery should involve in utilizing all available resources under the IT footprint. For this case, large corporations plan their investments in a wide demographic region which they can maintain for DR. Smaller companies prefer to invest in a shared datacenter and infrastructure to keep their data safe, survive the event and contain the impact to only physical infrastructure.

It is needless to say, having a Global Cloud Provider as a partner for such DR scenarios will help greatly as they already possess the global infrastructure required and can standup a complete organization in a short span. They also provide data backup and archival solutions which make it much more secure and redundant to the current organization's efforts.

Disasters can strike upto a 70km radius and a group of datacenters in a single region alone cannot guarantee a zero impact to IT. The need for a wider geographic regions arise when we speak on Disaster Recovery. Clearly, we are talking about something bigger than your Site A-Site B setup. We are talking about Region A and Region B types. When you want to implement a complete DR process, having your IT setup in different regions definitely guarantee a better DR experience. It is essential to have atleast a scaled down footprint of the current production infrastructure as part of the DR process. The DR footprint needs to mimic production and run independently without the need for the Live infrastructure. This goes without saying, the network, data and processing power needs to remain as identical as possible otherwise a DR plan would not make much sense. I have strong backup and maintenance plans, that should do good right?

No, not necessarily. Backup recovery solutions dont mean they are also disaster recovery solutions. Backups that reside within the dominant infrastructure alone will not support a DR procedure. The backup needs to be replicated and stored in a separate infrastructure which has the potential to recover from the backup and standup a partial or in-place infrastructure at any point of time without any dependencies. DR Readiness Measures

DR Infrastructure requires proper maintenance, patching and updates like the dominant infrastructure such that they remain in sync at all times and can host the required instances interchangeably without major issues or changes. Both the environments could benefit from a single, common software repository, version control system, access to data backups and archives, networks and network security solutions. Regular DR drills are absolutely mandatory and the setup need to be actively tested across all levels to ensure the DR plan can support the entire IT footprint in critical times. Budgets for maintaining a proper DR state should be rightly allocated to avoid blame games and regret once Disaster strikes.