Published Articles
 
Webcasts
 
White papers
 
Podcasts
 
 
Published Articles
 
2007  |   2006
 
 

May 16

The data recovery expectations gap
In his weekly column, Jim Damoulakis, GlassHouse CTO, writes about the gap between user expectations of data recovery and the capabilities of IT.

http://www.computerworld.com/action/article.do?command=
viewArticleBasic&taxonomyName=Storage&articleId=9000503&taxonomyId=19

A data warehouse became corrupted and needed to be recovered. The entire process was completed in three days, and following the completion the IT group felt pleased with the fact that they had completed the task well within the seven-day RTO (Recovery Time Objective) specified for this application in the company's Disaster Recovery Plan.

Unfortunately, the business unit dependent upon the data warehouse was either unaware or simply didn't care about the published RTO and was furious to have been down for so long. They fully expected recovery to happen much sooner.

Frequently, there seems to be a gap between user expectations regarding recovery and IT capabilities. The IT group in the above scenario is certainly justified in believing that they lived up to their commitment by pointing out the published RTO. However, in this situation, the RTO clearly did not meet the user's needs.

Actually the organization in this particular scenario is somewhat unusual in having defined, achievable goals. More often organizations lack clearly stated RTOs or RPOs (Recovery Point Objectives). In their place are vague assumptions about recoverability, and when RTOs do exist, they tend to be set unrealistically at levels beyond the actual recovery capabilities of the infrastructure.

Some of the reason for this can be attributed to the fact that day-to-day restoration of files on an operational basis is pretty quick in a reasonably functional backup environment. However, recovery to the degree required in the context of a disaster recovery situation is much more complex and can take far longer. If this had been the case -- a true disaster -- then the seven days might possibly have been a perfectly acceptable RTO.

To address this "expectations" gap, I believe that two categories of recovery metrics are needed, one for normal operational recovery and another for more exceptional disaster recovery. The time expectations and capabilities for operational RTO or RPO conditions are often far shorter than those of a disaster RTO or RPO, and should be specified within service level agreements. Needless to say, both sets of metrics must be reviewed periodically and revised when necessary. Despite an application's criticality to the business, the circumstances of an outage greatly impacts user expectations about recovery and IT policies must reflect this.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.

 

 

  © Copyright 2001 - 2007 GlassHouse Technologies, Inc. All Rights Reserved.

Privacy Policy | Terms of Use