Bigger is Not Necessarily Better

Mike Strange recognizes that some big data projects actually need small amounts of data to gain useful insight while others require massive amounts. (03:00)

The term big data is popular, but perhaps creates the wrong connotation.  The value of big data is in the learnings that we can gain from investigation, not from the data itself, or even the tools.  And some of the most useful insight comes from targeted analysis, often with small amounts of information.  It is often true that patterns are more statistically relevant when considering large volumes of information, but volume is not the goal – insight is the goal. This is not always clear to our big data teams and we, as leaders, can help.

Pariveda has done a number of big data projects, and we see all shapes and sizes.  Some big data projects actually need small amounts of data to gain useful insight.  Others require massive amounts.  We recently completed two such projects, started with a hypothesis in both cases and, upon reflection, learned some valuable lessons.  In one case, we analyzed the effectiveness of marketing campaigns, correlating results with environmental and customer-oriented factors.  Using a shockingly small amount of “trial” data, we found several correlations – some obvious and a couple surprises.  We then leveraged machine learning concepts to demonstrate that it is possible to predict the impact of a marketing campaign within a few percentage points.  This is a huge finding – determined by prototyping a tightly-defined goal using a small subset of data.  Once you prove a concept like that, you can expand the volume of data – incorporating even more interesting correlations such as seasonality or longer-term economic cycles.

We often see large data stores, sometimes called data warehouses, which have all the hallmarks of a hard-to-watch episode of Hoarders – large amounts of overlapping, loosely-organized information collected for the purpose of collection.  Collecting was the goal – organization, analysis and output are envisioned as future extensions.  Some organizations spend years on “data in” and never get to “data out”.  Today’s expectations of rapid innovation demand more.

I think the lesson is this.  Many big data programs would benefit from defined targets — value propositions – which establish the guideposts of progress.  In the example above, we sought to predict effectiveness through correlation of patterns in marketing programs.  Other big data efforts could focus on personalization by identifying patterns of consumer behavior, assessing cannibalization, determining price elasticity, assessing distribution channels, or attacking the elusive concept of determining consumer identity.  In any of these examples, it is more important to focus on the goal, and not the BIGness of your big data program.

On the other hand, depending on the situation, it may be constraining to apply the level of focus described above.  Sometimes we must collect, correlate, aggregate and analyze large amounts of data – applying research-style thinking to the investigation.  Sometimes we don’t know exactly what we are looking for.  For example, if we primarily applied goal-oriented thinking to medical research, we may never have discovered Penicillin.  But, in a commercial setting, we are often bounded by the realities of time, money and people.  In these settings, it may be more effective to show incremental progress through a more goal-oriented approach.

It actually sounds funny to say “we are driving a big data initiative, and intentionally starting small” – but sometimes that is the right way.

Leave a Reply

%d bloggers like this: