Published Articles
 
Webcasts
 
White papers
 
Podcasts
 
 
Published Articles
 
2007  |   2006
 
 

december 5

New data classification tools drill deeper, but still lag
Jim Damoulakis, GlassHouse CTO, writes about how "Data classification is still far from automatic"

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005651

Last year, I undertook the process of ripping my 700+ CD collection to disk. The proper categorization and tagging is a major part of this effort, and fortunately tools like iTunes support looking up album information and populating metadata fields. However, the online databases often contain inconsistencies and inaccuracies. As a classical music buff, I spent hours reconciling and retagging inconsistencies like: "Beethoven", "L. van Beethoven," "Ludwig von Beethoven", and "Beethoven, Ludwig van". Even this inconsequential bit of data organization consumed a good deal of time.

Over the past three years, companies have eagerly undertaken grandiose data management initiatives, like information life-cycle management (ILM), only to have that initial enthusiasm wane due to the seemingly insurmountable obstacle of examining and classifying data. Without a strong commitment and active participation by a cross-functional team from business, legal/compliance and storage, it is impossible to do data classification effectively. Beyond this hurdle comes the truly daunting task of performing the classification itself: applying policy, managing the classification metadata, and the actual movement of data (and then continue to do so for the new information that arrives in torrents daily). It takes but a few nanoseconds to realize that an application is required to accomplish this on anything but the smallest scale.

Needless to say, this represents a business opportunity and several companies have entered the fray of data (or information) classification. Unlike earlier file-based SRM applications that only examined file system metadata attributes (e.g. creation date, last access, owner, document type), newer tools can also inspect content, identify patterns, and generally provide greater insight into the purpose and value of a particular file. Some of these applications also provide extensive indexing capabilities, as well. They can also be customized for specific industries.

However, several challenges still remain. First, most of these applications focus on file-based data and much of the most important application data resides in databases. Setting this aside, the issue remains that, in most cases, data tagging and validation still requires a human to make a decision. Although applications have come a long way in being able to identify patterns that look like social security numbers and applying context logic to distinguish "Let's call Sue Smith" from "Let's sue Smith", the effort of data classification is still far from automatic. It still requires a substantial commitment that must be weighed in context of compliance and litigation demands.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.

 

 

  © Copyright 2001 - 2008 GlassHouse Technologies, Inc. All Rights Reserved.

Privacy Policy | Terms of Use