Published Articles
 
Webcasts
 
White papers
 
Podcasts
 
 
Published Articles
 
2007  |   2006
 
 

february 22

Uncovering your data's DNA
In his weekly Computerworld column, Jim Damoulakis, GlassHouse CTO, says that in storage management, data classification continues to be one of the major stumbling blocks.

http://www.computerworld.com/hardwaretopics/storage/story/0,10801,108886,00.html

My current amusement after working hours has been experimenting with the fun new Web site Pandora.com, where you can easily create your own personalized streaming music services.

After entering the name of a song or artist that you like, Pandora may play exactly what you've asked for or offer alternatives that match the stylistic attributes of your request. You can then rate the offering using a TiVo-like thumbs-up or thumbs-down approach, which serves to further refine downstream offerings.

At the heart of Pandora is something called the Music Genome Project. This multiyear data classification effort identified hundreds of musical attributes, devised a taxonomy and then went about the effort of classifying thousands of songs according to its matrix. The result is that when you ask for a song, the attributes of that song can be used to identify others that may also appeal to you.

In storage management, data classification continues to be one of the major stumbling blocks. Whether you're focused on information life-cycle management, tiered storage or just building good storage management practices, a critical element is to identify data importance. While the prospect of this strikes fear into the heart of many storage managers, it needn't be overwhelming.

The first step is to determine which attributes to focus on. Unlike music, there are only a few general categories that we need to deal with:

  • Data availability, including parameters such as uptime (scheduled and unscheduled) and performance

  • Data retention, including retention time, retrieval time, and retrieval frequency

  • Data protection, including operational recovery time objectives/recovery point objectives, disaster recovery RTO/RPO and security

The policies and metrics established for each of these categories can form the basis of a taxonomy. I can't overstress that an application-needs perspective rather than a storage technology perspective must be the basis of such an effort. Also, don't get bogged down in nuances. From a storage management perspective, only a handful of metrics are likely to result in actionable differences.

It probably makes sense to initially focus on a single category, such as data retention, perhaps with the goal of developing an archiving strategy. The greatest challenge is likely to be establishing the necessary cross-functional working relationships. This initial classification framework and methodology can then pave the way for follow-on phases.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.

 

 

  © Copyright 2001 - 2008 GlassHouse Technologies, Inc. All Rights Reserved.

Privacy Policy | Terms of Use