Govciooutlook

Disaster Recovery

By Todd Simpson, CIO, U.S. Food and Drug Administration (FDA)

Todd Simpson, CIO, U.S. Food and Drug Administration (FDA)

The exponential growth of data and the reliance of business on IT have forced IT to provide services that are available at any time without any interruptions. Unfortunately, datacenters that support the backend infrastructure and services are not exempt from experiencing a disaster. Most datacenters are built with high availability and can provide uptime in case of minor issues but a major disaster caused by human error, system breakdowns, or natural disasters can bring the business down to its knees. Worst of all, you never know if or when it will happen to you. Thus, it’s important to have a disaster recovery plan (DRP) and a Business Continuity plan (BCP). While the DRP addresses the IT element of continuity (e.g. data, application, and infrastructure recovery), a business continuity plan incorporates organizational and human resources issues such as communications plans and crisis management.

“The backup and recovery process are in line with industry standards and will protect the FDA against most types of data loss. “

According to industry data, software and hardware failures account for about half of unacceptable downtime. Less than a quarter of outages result from major events, such as fires and natural disasters. In order to minimize downtime and increase productivity it’s important that agencies incorporate DR scenarios into service management processes. This should help leadership make better decisions about how to respond to less obvious DR scenarios, when to escalate those incidents, and whether to initiate recovery procedures rather than continue troubleshooting.

The FDA currently operates several FDA data centers for the delivery of FDA services. Our minimum goal is to maintain redundant infrastructure between two centers and leverage the cloud where it makes sense. Every FDA data center backs up all systems and data daily. In the event of a disaster at one of the sites, we restore to alternate computing infrastructure (DR site). Finally, defining the Recovery Point Objectives (RPO) and aligning IT to the business enables FDA to identify and manage the maximum targeted period in which data might be lost from an IT service due to a major incident.

The FDA’s disaster recovery solution encompasses a large variety of technologies. Our DRP solution needs to meet security and business requirements, support core-mission critical application disaster recovery, support all technologies, provide full redundancy capabilities, be high availability, minimal latency and be compatible with existing architecture. As with any DRP exercise, a risk analysis was performed to determine the greatest threat that could impact production. During this analysis, the FDA performed a “current state” of the environment inspection. It was discovered that by following best practice industry standards, the FDA maintains an appropriate level of redundancy within each of the data centers. The backup and recovery process are in line with industry standards and will protect the FDA against most types of data loss. 

Typically, you won’t have access to the unlimited resources enabling a comprehensive DR response. It is important to identify those business critical applications essential to operations. It’s also important to weigh the risk against the benefits and costs, as you would do when solving any business problem. In the absence of any DRP or BCP there is a significant level of assumed risk. When we commenced our DRP at the FDA, we needed to perform a thorough discovery of all of production applications. A Business impact analysis (BIA) was completed for each application to set priority. The applications were categorized as high, medium and basic disaster recovery tiers with the initial focus on high (the most critical) applications.

Lastly, we explored a dedicated DR site and how that would interact with the cloud. The FDA considered public cloud offerings, commercial data center solutions, private disaster recovery services and government hosted options. An extensive analysis was done to compare the solutions and a short list of options was created. After completing a thorough site and cost analysis of each was solution, it was clear that hosting a solution at one of the FDA facilities would provide the FDA with the greatest chance at meeting its recovery goals. The final FDA Disaster Recovery solution will include automated recovery of the critical applications hosted within the FDA.

Read Also

The Next Step is Open Source

The Next Step is Open Source

Jim Hall, CIO, Ramsey County
Transformational Leadership

Transformational Leadership

Ed Toner, CIO, State of Nebraska
Combating Cybersecurity Challenges

Combating Cybersecurity Challenges

Ram Murthy, CIO, US Railroad Retirement Board