Sunday, February 15, 2015

What Is Unstructured Data?

by Brenda J. Christie



In the world of Big Data, "unstructured data" is usually considered data that does not sit neatly in column and rows.  Most traditional mainframe relational databases such as DB2, consists of tables defined using rows and columns.  For illustration purposes, the types of functions typically performed using an Excel spreadsheet whose data is laid out in rows and columns.  Using this layout, it is possible to sum numeric information, such as currency, square feet.  It is possible to count the number of occurrences of a given value.  It is possible to arrive at an average value, as well as a statistical deviation, etc.  This is possible because like values appear in either rows or columns. With the results of such manipulations, companies are able to make decisions concerning inventory, identify products which are selling below expectations.  Medical researchers are able to identify statistical anomalies, slight deviations in study results and so on and so forth.

Unstructured data turns this type of quick analysis on its head.  In fact, this same ease of aggregating, averaging, counting, etc. is not available when working with unstructured data.  Unstructured data which consists of human-generated data which comes from email, video, text messaging, multimedia, blog content, web content, presentations etc., is a different animal, and as such has lead to the development of new tools for reclaiming and surpassing analytical capabilities traditionally available through a row and column layout.

In a May 1, 2013 Gartner blog article, "Big Content:  The Unstructured Side of Big Data," Darin Stewart reports that roughly 80% of a companies total information assets.  There is value in being able to search for trends and patterns within the different forms of unstructured data.  Consider the U.S. Census, for example.  Following a Census, government agencies, both local, state and federal, have often provided more of a service - built more schools, provided funds for more hospitals, invested in better roads and transportation.  This is possible because they were able to aggregate Census survey results to determine overall demand for services.  In other words, the data the survey produced resulted in actionable items.  The data, however, followed a particular data model. Unstructured data does not follow a particular model, especially when the sources are different.  For example, combining 1000's of text messages with 1000's of video content would not necessarily yield any clear patterns because both media are different.  Such is the problem confronted by companies today.  How to unify all the data created by humans through text messages, likes, tweets in order to capitalize on trends and patterns.

Over the next several weeks how this problem has been addressed will be examined.  This will include mainframe's role in this journey, how it has evolved to meet its latest challenge and who its partners and competitors are.

Join us for the ride.

Bye for now.

Brenda J. Christie

No comments:

Post a Comment