Published Articles
 
Webcasts
 
White papers
 
Podcasts
 
 
Published Articles
 
2007  |   2006
 
 

july 18

Perhaps high-capacity disk isn't such a good thing
Every organization has large quantities of aging unstructured file system data
Computerworld Opinion by GlassHouse CTO, Jim Damoulakis

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9027282

Remember the sci-fi classic TV show The Twilight Zone? One recurring theme of that series was to go back in time and imagine what would have happened if some event or person had never existed. Imagine what would the world be like today if cheap storage did not exist -- in other words, what if disk drives hadn't experienced the enormous increase in capacity and decline in price of the past 20 years?

For one thing, data management practices would have evolved very differently. If you adhere to the truism that necessity is the mother of invention, data retention and purging practices and capabilities would likely be far more advanced than their current state. One downside might have been that the online mass-media revolution of iPods, YouTube, et al., might never have occurred. On the other hand for corporate IT, given that only truly valuable data would be retained, perhaps supporting e-discovery would not be the challenge it is today.

The reality is that thanks to abundant cheap disk space, we've become storage gluttons and are desperately playing catch-up in areas such as indexing and classification. Driven by the dual needs to address e-discovery liabilities and to control data run-rate costs, serious efforts are now under way within organizations to gain better control over data. While the initial starting point was e-mail, analysts have long been trumpeting both the risks and opportunities associated with unstructured data.

Every organization has large quantities of unstructured file system data, much of it sitting aging and untouched with a relatively small percentage of any significant value. Anxious to follow up on e-mail archiving successes, vendors have introduced improved versions of products that relocate file data to lower-cost storage while still providing access if or when it's ever needed. However, much of this capability is driven by metadata attributes -- file type, owner, last access, etc. -- not by intrinsic value, and therefore solves only part of the problem.

Classification based on actual content has always been problematic, but is essential to meet data liability concerns. Some organizations use dedicated document management applications, a largely user-driven manual effort that can be effective, but is often costly and complex. Over the past few years, products have been introduced to index and classify unstructured data based on content. Through the natural maturation process, these technologies have evolved to a point where they are now worth consideration, particularly in environments with significant e-discovery issues.

Comprehensively management of unstructured data actually requires a combination of all three approaches: user tagging and classification, indexing and automated content classification, and metadata-based relocation. While the proportions of each will vary significantly based on organizational needs, we are at a point where the unstructured data problem should no longer be ignored.

Jim Damoulakis is chief technology officer at GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.


 

 

  © Copyright 2001 - 2007 GlassHouse Technologies, Inc. All Rights Reserved.

Privacy Policy | Terms of Use