Big data (part 2)

February 4, 2015[1]

A recent column in Forbes[2] kicked off a discussion with a friend of mine, Tim Powell[3]. We’ve both been looking at “big data”, and how it not only fails to contribute to, but may even inhibit competitive intelligence and other analyses.

Let me start with the column:

It points out that sports writers have been made more timid in their predictions because of the availability of increased data, sports’ big data. The reason given is that the sports writers are concerned about having their own opinions thrown back at them when the data is later analyzed and the writers’ prediction turns out to be wrong.

The article pointed out an analogous situation: where doctors might elect for a diagnosis and treatment that is more data driven, fearing future lawsuits, than one based on their experience and related highly trained instincts.

Rich Karlgaard, the column author, also pointed to the case of Starbucks in the early 1990s, which had hit a “slow patch”. Management there was looking to get more data to figure out what was wrong, but the number two at Starbucks actually went to the field talking (gasp!) to employees; he found the problem was one of attitude among new employees as well as among older employees. As the column concludes, “Trust your eyes and ears. The data are your tools not your master.”

That is all true, but the problems of dealing with big data can often be traced to a fundamental flaw that can be expressed this way:

We only deal with that which we can measure or already have measured (so Starbucks’ executives never would have talked to employees, but simply would have collected more irrelevant data). That in turn means, we measure only what we have or what we can get, instead of seeking to determine first what it is that we need – turning the analytical process on its head.

This I think is one of the major problems that we in competitive intelligence and others face in dealing with the world of big data. Not all data is quantitative or digital – some is qualitative or non-digital. But big data proponents, think the NSA, operate in a world where data is only what can be collected and stored on a computer and analyzed from there. So, in the case of the NSA, it focusses on collecting, storing and analyzing communications data – why – because it can[4].

The problem becomes that people collect data because they can collect it or because maybe, someday, perhaps, they might need it (ignoring the whole issue of half-life, which is a nice way saying that some data goes bad pretty quickly, as well as the problem of having too much noise in the data).

The right approach: determine what the question is before you then determine what data might be useful to help craft an answer. Today, big data seems too often driven the other way – determine what answer can be provided, and then attempt to drive the end users to produce a question that can be answered.

[1] Part 1 was posted July 24, 2013.

[2] Rich Karlgaard, “Data Wimps”, Forbes, Feb. 9, 2015.

[3] The Knowledge Agency.

[4] This practice is indirectly challenged by the conclusions in Erik Dahl, Intelligence and Surprise Attack: Failure and Success from Pearl Harbor to 9/11 and Beyond (Georgetown Univ. Press, 2013). His tabulation of 227 terrorist plots and cases finds that human intelligence is by far, approximately 50%, the reason for the failure of the plot or effort. Interestingly, signals intelligence fall behind both overseas intelligence and unrelated law enforcement efforts as a reason for failure, accounting for approximately 10% of the cases.


3 Comments on “Big data (part 2)”

  1. […] The problem becomes that people collect data because they can collect it or because maybe, someday, perhaps, they might need it (ignoring the whole issue of half-life, which is a nice way saying that some data goes bad pretty quickly, as well as the problem of having too much noise in the data). The right approach: determine what the question is before you then determine what data might be useful to help craft an answer. Today, big data seems too often driven the other way – determine what answer can be provided, and then attempt to drive the end users to produce a question that can be answered..  […]

  2. tripkrant says:

    A couple points:

    – Yes, there is data that is qualitative or non-digital – but it can also be digitized. Likewise a HUMINT report feed into a reporting system is at the minimum going to generate metadata, it can also be run through natural language processing to extract entities, relationships, topics, and sentiment.

    – Yes, NSA “operate[s] in a world where data is only what can be collected and stored on a computer and analyzed from there.” So does CIA and every-other intelligence service in the world. The sinews of any intelligence service are its files – are they still to be using drawers full of 3×5 cards?

    – Yes, data is collected because it can be collected, and because it might be needed in the future. And yes, data half-life is an issue – but that does not mean the data is without intelligence value. It can be used for verification and validation purposes in the future, and it can be used to establish baselines so you can spot the difference that makes a difference.

    – Determining what the question is before determining the data to collect – in other words, the classical intelligence cycle – works poorly in a VUCA world. Its puzzle solving in a world of dynamic mysteries to use Gregory Treverton’s metaphor.

    You seem like prolific reader, I would be happy to send you some material you will find interesting.

    Best,
    Trip


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s