ICAEW.com works better with JavaScript enabled.

How AI can untap the dark data goldmine

Businesses are sitting on a potential goldmine of untapped data. How FDs and CFOs can use AI to take advantage of their hidden assets

Like dark matter, we don't know how much of our data is "dark", but we know that it's most of it. Gartner, the IT consultancy, defines dark data as “the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes.”

Kirsten Gillon, technical manager for the ICAEW IT Faculty, also warns that the operational cost of dark data is a growing drag on the business if it is not managed. "This is really about helping organisations to think about what they can do with all that data."

One of those creative purposes is to use historical or unstructured data for predictive purposes, using machine learning or AI. Already, the National Business Research Institute reports that 61% of businesses said they had implemented AI in 2017.

Datumize, a dark data start-up, estimates that two-thirds of all a company's data is dark (IBM goes further, and estimates 80% of all data is dark or unstructured, rising to 93% by 2020). One of its applications, says Carlota Feliu, marketing director, is to control risk and optimise performance by tracking movement of people and assets in their premises. The source: neglected, hard-drive-clogging, wi-fi router logs -- the ultimate dark data. "Imagine you're a warehousing company. Having all the information about the movements of your workers and assets can help you to define optimal paths, or improve your spatial organisation," she explains.

Dark data may often be unstructured and noisey, for example social media data, which has limited value for predictive analytics because valuable signal (sentiment or meaning) is often drowned in noise. Therefore most organisations focus on what social media says about them, or use it only for customer service. At the other end of the scale Steve King, the CEO of Black Swan Data, has built a "social prediction" application for PepsiCo that ingests 50 million pieces of data every day from Twitter data, Instagram, blogs, forums and reviews, as well as sales data. It uses this to predict the future demand for more than 1,000 ingredients, using customer views of 72 benefits, and 52 themes, in five of Pepsi's markets.

But, how do you know which dark data will be useful, and which should be deleted, archived or ignored? King has three recommendations. First, the plunging cost of cloud storage means you don't need throw away your dark data unless you are sure it is too inconsistent or incomplete to be useful in future. Instead, store it in the cloud in a (compliant) way.

Second, try to be disciplined about how the data is collected and stored, so it will accessible, even if that means some investment. Some data remains dark simply because its quality means it is too poor to create useful insight or train an AI. As an example, Black Swan has worked with more than 70 airlines to help predict what customers will want on a flight, from films to food. The data science was robust, but most of the investment was in cleaning up dark data. "Around 60% of the food on airplanes is thrown away, but the data on this was badly kept: different spreadsheets, different data formats, which meant that 90% of our project was getting the data into a format so that we could do something with it," King explains.

FInally, and most importantly, to create value, focus on outcomes, not applications. This means working collaboratively and keeping the focus on the most important business problems, rather than the most interesting AI applications. "This means an accountant needs to know enough about data to understand what's being talked about," Gillon warns.

At this point, "you can effectively take a walk through all your dark data until you find something that will help," King says, "But don't be distracted along the way. Stay true to finding that outcome."

Originally published in Economia, May 2018.

Author: Tim Phillips