It’s Time to Hop into The Deep Water of Big Data

Guest Blogger: John Mancini, author of

John Mancini is an author, speaker and respected leader of the AIIM global community of . He is a catalyst in , ,  and  technology adoption and an advocate for the new generation of experts who are driving the . John predicts that the next three years will generate more change in the way we deploy enterprise technologies and whom we trust with this task than in the previous two decades. He is President and CEO of .


I see three primary drivers for all of the current interest in Big Data and Big Content.

There is no better source to start your thinking about Big Data and Big Content than the 2011 IDC/EMC Study, .  Per Digital Universe: “The world’s information is doubling every two years. In 2011 the world will create a staggering 1.8 zettabytes. By 2020 the world will generate 50 times the amount of information and 75 times the number of ‘information containers’ while IT staff to manage it will grow less than 1.5 times.”

Driver #1 — Clearly something must change if we are going to manage exponentially increasing volumes of information with finite resources.

Now there are some that will say, “Well that surely overstates the problem before us because the information in the Digital Universe must surely be personal information — not the kind of information that organizations are concerned about.” However, per IDC, “while 75% of the information in the Digital Universe is generated by individuals, enterprises have some liability for 80% of information in the Digital Universe at some point in its digital life.”

Driver #2 — There is information management risk to organizations associated with how they manage this torrent of information.

To make things even more complicated, all of the information being created by individuals has organizational value in that it provides an outside-in perspective on the organization.  If mined effectively, this information can create market opportunities and first mover advantage for those that understand what the mass of data is trying to tell them. (For a great post on this, see Bill Schmarzo’s .)

Driver #3 — There is organizational value and insight that will be gathered by someone with the right tools, techniques, and vision.

As we built in our organizations over the last three decades, we focused first on getting structured data under control.  By structured data, I mean all of the hard data that neatly fits into the rows and columns of a database.  For many organizations, the journey to get this information under control is largely complete (although far from perfect). 

The second half of the Systems of Record Journey focused on getting the unstructured information in our organizations under some semblance of control.  This journey started first with mission critical processes and the documents associated with them, and later expanded to a broader focus (driven largely by legal and compliance concerns) on creating strategies and systems to govern a broader body of unstructured information.  This journey is largely incomplete for organizations, although significant progress has been made.

Enter the world of Big Data and Big Content and .  As we expanded the focus of our information management goals from cost reduction and risk mitigation to value creation, these Systems of Engagement — engagement with partners, with employees, and with customers — dramatically expanded the volume, velocity, and variety of information that must managed.

As a result, in our organizations are being asked to sort through two largely inconsistent priorities — and keep the lights on at the same time.  First, they are being asked — for litigation, risk, compliance and sheer storage cost reasons — to get rid of everything that doesn’t need to be kept.  Second, they are being ask to preserve — at least temporarily — vast volumes of Big Data and Big Content in order to analyze this massive digital landfill and extract value and insight.

On the “risk” side of the equation, the volume of information coming at us is making it clear that manual information retention and disposition processes simply extended from the world of Systems of Record will no longer suffice. Aside from the sheer enormity of the task, a lack of clarity about what content is valuable is the main obstacle, along with the fear of getting it wrong and a sense that there is no immediate ROI from getting rid of outdated information.

The reality in most organizations is that traditional approaches to information disposition are a joke, and it’s not for lack of effort. It was never realistic to assume that knowledge workers would assist in manually classifying documents according to a complex records retention schedule, and it is equally unrealistic to assume that we will manage the fire hose of data and unstructured ephemeral social content with the same degree of records rigor that we applied to retaining a life insurance policy for the life of the policy holder.

But big data is more than managing information-related risk. Organizations are increasingly realizing that there is value hidden away in their digital landfills. McKinsey () believes that big data can create significant value for the world economy, enhancing the productivity and competitiveness of companies and the public sector and creating substantial economic surplus for consumers.

The core difference between the “high-value per byte information” in Systems of Record and the new “low-value per byte information” generated by Systems of Engagement is that this new information tends to have value in the aggregate or as it is interpreted rather than intrinsically. In other words, it is easy to see the value in storing a document or a piece of data that documents a specific transaction or process. It is more difficult — and it has been too expensive in the past — to do so with vast quantities of digital flotsam and jetsam that have value only as they are aggregated and analyzed.

We now can finally get at the long-standing vexing problem in analyzing unstructured information, the lack of . Advances in semantics, search, and content and text analytics are now making analysis of large amounts of unstructured information practical for first time. Improvements in our ability to generate “” are revolutionary and will transform the way we look at vast quantities of unstructured information.  In addition, for the first time, natural language processing and visualization technologies are moving the analysis of all of this data and information from technical back rooms and into the executive suite.

Per AIIM’s , over 70% of organizations can envision a Big Data or Big Content use case that would be “very useful” or “spectacular” for their organization. Case management and BPM applications become far more effective as the hidden customer intelligence in digital landfills is added to the equation. Big Data opens up new ways of modeling risk and failure prediction, analyzing customer churn, and analyzing web and web advertising effectiveness. 

So where does an organization start? 

I think a great place to start is with the following ten recommendations from report:

  1. Ask blue-sky questions of your business such as “if only we knew…” or “if we could predict…” or “if we could measure…”  Consider how useful that might be to the business before thinking about how it can be done or at what cost. 
  2. Play those questions off against the data you already have, data you could collect, or data that you could source elsewhere. 
  3. Include in your thinking structured transactional data, semi-structured logs and files, and text-based or rich media content. 
  4. Incoming communications from your customers, outbound communications to your customers, and what customers (or employees) are saying about you on social sites can all be useful for monitoring sentiment, heading off issues and analyzing trends.
  5. Consider high volume streams such as telemetry, geo-location, voice, video, news feeds, till transactions, web clicks, or any combination of these.
  6. If your content is currently digital landfill spread across disparate file shares and content systems, consider how this could be rationalized prior to any big data projects.
  7. Content access for both search and analytics, and if necessary, content migration to dedicated big data storage, can be facilitated by unified data access products.
  8. Don’t be tempted to rush into in-house developments or specific point-solutions without considering wider, more universal analytics platforms.
  9. Consider SaaS and cloud deployment as a faster way to acquire experience and focus activity.
  10. Big data is much more about the breadth of the analysis and the insights that can be achieved than it is about the size of the data and the underlying database technology.

Big Data, Big Content, and Analytics use cases and successes are clearly in their early stages.  But they are coming.  And soon. Organizations need to make sure that they are prepared for the deep water of Big Data.

Looking for more information on ?

Looking for more information on ?

This entry was posted in Big Data, Enterprise Content Management and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s