ICAEW.com works better with JavaScript enabled.

Data Analytics Community

Data ingestion: past, present, and future

Author: Franki Hackett, Head of Audit & Ethics, Engine B

Published: 12 Jan 2022

We’ve been talking about data ingestion in audit for several years and have come face to face with the significant problems it poses. But getting to grips with the realities of the problem, and advances in AI, are beginning to suggest the problem has nearly been solved.

Past – firms go it alone

We’ve been talking about the possibilities of data and analytics in accountancy and audit for several decades and applying the use of data for more than a decade. The expectation now is that businesses are run, and audits are performed, on data. More and more tools enter the market every year supporting businesses to ‘leverage their data’ or helping audit firms cut through directly to risks. But the first and most significant barrier all of these technologies come up against is the difficulty of getting data into one tidy, standard format in the first place to allow analysis to happen. Especially in audit, where the firm doesn’t have any control over how data is collected or stored, initial experiments in using data science were quickly prevented from scaling by the difficulties of data collection and preparation. Attention turned to technologies that could prepare data for us automatically. So in 2022, where are we on data ingestion in audit?

Initially, audit firms with the ambition to use data started on the data ingestion process individually. They immediately found that the problem wasn’t as easy as initially expected. The initial approach was to build a data ‘map’ from an ERP system, for example Sage 200, to the specific fields required by an analytic. Often this map was produced by a specialist in the data science team, who would then build a tool which ingested the data for the audit team. The audit team then immediately ran up against problems, like:

  • The client was using Sage 200, but they were using a different version of Sage 200 to the one that had been mapped
  • The client used the right version of Sage 200, but had implemented it in a bespoke way which meant the map didn’t work
  • The client had used a standard implementation of Sage 200 which matched the map, but had extracted the data in a different way to the way the ingestion tool was expecting
  • When the client extracted the data correctly in period 9, they extracted it differently in period 12
  • Even when the right report was coming out of the system, sometimes the columns were in a different order, or had different names
  • The client or the auditor had renamed the file, which meant the software wouldn’t accept the upload
  • The client exported the file in a different file format to the one the team expected, for example excel rather than csv
  • The client had used a standard implementation of Sage 200, but used some unusual accounting treatments, which meant that the map didn’t collect the data needed from the test because it was stored in some unexpected place

And after all of that, if the data was mapped accurately for that analytic it then often couldn’t be used anywhere else in the audit, because other tools required different inputs. Firms who wanted to use analytics either started hiring large (often offshore) teams to perform manual data preparation or building direct connectors for clients big enough and technologically savvy enough to allow a secure outside connection for data – but this luxury was reserved for only a few firms and a handful of their clients. Companies which could not afford the staff costs of these activities were often left out in the cold, until the arrival of data extraction firms.

Present – technology firms go it alone

In the mid-2010s, companies started arriving on the scene which promised to automate the process of risk assessment and analytics for audit firms, with many promising some form of data ingestion. Some focused particularly on particular client systems, getting as much data as they could out of Sage and Xero, for example. Others focused on bringing in other useful data – like bank statements or balances. Others offered rich risk analysis. But all promised to bring analytics to audit at a price point which made it accessible to all firms.

Initially though, many audit firms found they had the same frustrations with these tools: even where a connector to a specific ERP was promised, it didn’t work in practice on many clients who have set up their system differently or extracted information differently to how the technology expected. The enormously valuable features of these tools were hard to implement because reliably getting data into them was enormously painful, and unreliable. Some analytics companies turned to manually ingesting data, increasing turnaround times and reducing the insight and control auditors have over the end-to-end data process. Auditors who used one technology were also often stuck with that vendor, finding technology systems which didn’t speak to each other, and with outputs and documentation approaches which did not play well. More firms were using analytics, but many have found this period very painful. This is largely where we are still – some tech firms offer connectors for some systems, but most audit firms still find a significant chunk of their clients have data which is very tricky to ingest into tools.

Future – collaboration shows benefits

Over the past couple of years though, a handful of developments in the audit data space have started to suggest we might be near to solving the problem. The first is the ICAEW’s foresight in pulling together a group of audit firms of all sizes and the data model experts at Engine B to collaborate on creating an audit Common Data Model. This Common Data Model gives a standardized structure for audit data which is totally open-sourced, meaning it is free to use for any audit firm or technology company, and it was built collaboratively by a large group of audit firms, meaning it contains what’s needed for almost any audit methodology. This kicked off the ability for tech firms and audit firms to collaborate and share knowledge about data extraction and analytics, helping us learn together how to solve data problems.

Since then, Engine B and others have been using the Common Data Model as the basis for an alternative approach to the data ingestion problem. Starting from the point of view that audit data is weird and diverse, why not accept that instead of fighting it? Tools are beginning to emerge which put the power back in the hands of the auditor, so an auditor can upload their data from their client and decide for themselves how it should be mapped to fit into the Common Data Model. AI is being used to make predictions and recommendations to support this process, and each time an auditor makes a new map, the system gets smarter. Tech companies are working collaboratively with audit firms to design tools where every time an auditor uploads data which isn’t yet recognized by the system and maps it, the system gets a little bit better at ingesting data from all other systems, helping auditors at other firms move closer to using data in audit.

And as other firms, like Mindbridge or Microsoft’s PowerBI, can now use the Common Data Model format as an input into their tools, more and more tools become accessible to firms which can’t afford to hire thousands of people to do data preparation.

Data ingestion is such a simple concept – getting data from the client to the auditor in a re-useable format – but it’s been a really hard problem to solve. Fortunately now we’re close to solving it, it opens up a whole new world of possibilities for data in audit.

*The views expressed are the author’s and not ICAEW’s.