Data extraction quality getting it right every time

As accountants and auditors move towards increasingly sophisticated tools to access and analyse data, it’s easy to think that we can automate our way out of the evidence quality problems that plague more manual forms of audit. But as Franki Hackett, Data Analytics Community Advisory Group Chair explains, this is by no means certain.

If we copy our client’s database or use an API or some form of direct connector to extract data, we don’t face the same risks of errors or inadvertent manipulation of data as with the client sending us their records manually, do we? This view unfortunately relies on a tendency called ‘automation bias’, where human beings are likely to over-trust the outcomes of computer processes. It’s essential, especially as auditors, that we retain our professional scepticism of any data coming from a client, and that we interrogate the quality of data we’re extracting no matter the source or method of transfer. While the risk of error is typically higher with a manual extraction, this doesn’t mean there is no risk of error when using more automated extraction tools. And it really matters – as our accounting processes and audit tests become more reliant on data, they also become more reliant on the quality of that data.

As auditors, many of us aren’t trained in good data quality principles, and so often we feel we don’t know where to start. This checklist, though not by any means exhaustive, gives a starting guide to some of the things accountants should look to build into their approach to confirm the quality of extracted data. Some technology, like Engine B’s Integration Engine, will automate many of the key checks, making this even easier.

Matching checks

Matching checks confirm that what you’re getting out from the client matches what’s in their system. This is about ensuring that the data extraction has run as intended, that no filters were inadvertently applied, and that the integrity of the data has been maintained. You can often do these checks either sat with the client logged in to the system or with a simple query of the database. Matching checks include:

Record numbers: extracted data has the same number of records (rows) as there are rows in the database. For example, if the client has 150 suppliers in their system, you expect to see 150 suppliers in the supplier balances list
Totals: The net totals on the system match the net totals in the extracted data, for example your receivables list matches the balance of receivables in the client’s system. Do note of course – a General Ledger population should net to zero, so if it doesn’t that generally not a good sign, but if it does, that doesn’t necessarily mean your data is complete either!
Absolute values: the absolute value on transactions in the system matches that in your data extract. This can be more useful when checking GL data, or if possible, checking total debits and credits separately to each other.
Dates and filters checks: when performing the above checks it’s essential to ensure that the dates and filters applied to your checking data are what you want for your extracted data. If your year end is 31/12 and your client shows you the value in the front end of their system as at 10/01 then you cannot assess whether the data you have extracted is accurate to your year-end date. Also be aware of different types of dates – it is not uncommon for clients to inadvertently extract data based on the dates that entries were created in the system, when from an audit perspective you typically require data to be extracted based on the accounting date, which may not be the same.

Depending on the method by which data has been received, it may be pertinent to perform some basic checks on the validity of data in key columns such as dates and amounts. This can help identify any basic issues with so-called ‘data shifting’ where the data has been extracted fully, but the column and row integrity hasn’t been maintained (typically due to non-standard or unexpected characters in the dataset). Blank values in fields where you don’t expect blanks, or non-numeric values in fields that should be numeric, are tell-tale signs.

Reconciliation checks

Reconciliation is a classic tool in the accountant’s toolbox, and it’s probably the best tool we have for assessing data completeness and accuracy. Reconciliations can include:

This year’s opening to last year’s closing balance – have values been introduced in the gap between years? Were there any post-close or off-system adjustments?
Ledgers to subledgers, and vice versa
Master Data to detail data. For example, confirming customer balances against opening balances and receivables transactions in year.
Completeness of reconciliation: are there categories of activity, account codes, customers, suppliers, or anything else which appear in one source of data that are missing from somewhere else you might expect it. I once had a client that had excluded an entire category of potentially reputation-impacting provisions with no in-year transactions from its trial balance, but identifiable in a provisions listing.

Remember – reconciliations should typically be performed two-way. In other words, check that all the values in your primary dataset agree to your comparison dataset, and also check that all the values in your comparison dataset agree to your primary dataset. This ensures that your primary dataset has neither too few, nor too many records.

Sense checks

Sense checks are checks you can perform using your knowledge of the client and your own nous to assess whether data is good quality. If you were receiving data in a traditional way and then preparing it manually, these are things you would do anyway almost instinctively – when we use technology to extract and prepare data we need to build these back in.

Overall values checks: does the overall value of transactions on the data match your expectation of this client? I once audited a client whose annual net expenditure was around £20m, but who were showing netting off transactions of more than £16bn in their General Ledger. That was a good indicator that something was wrong in the data!
Everything included checks: how many account codes are appearing in the TB you have extracted? And how many do you expect based on your knowledge of the client? Often the first sign that something is missing is a sense that the data isn’t big enough or complex enough for the entity you’re working with.
Richness checks: Some clients provide very little data – a GL will be no more than the date, the transaction number, the amount and the account code. Others, especially more sophisticated clients, will have much richer data with details of who entered and approved transactions, descriptions, and foreign currency amounts. If your assessment of client controls tells you they are relying on automated approvals within their system, you should expect this data to appear. Confirm for yourself whether the richness of your data reflects the reality of your client.
Language and character checks: if your client operates in multiple jurisdictions and has staff working in languages like Chinese, Arabic, or Russian, characters in these languages would most likely appear in your data. If they are solely based in one country with limited overseas trade, it would be highly unusual to find e.g. Russian characters in their system. Looking at the presence or absence of foreign language characters can help you see if a multinational is excluding data from some countries, or identify foreign transactions which are not expected in a mostly onshore organisation.

It's critically important to remember that data quality checks are not subject to the kinds of tolerance that we might apply, especially in audit, to other tests. This is simply because the checks we are performing here are about the assessment of the population, from which our audit tests are then performed. There is no ‘immaterial’ level of poor completeness, because you can’t assess materiality without knowing the size of the whole population. Differences that ‘net off’ are, by definition, a sign that you probably have incomplete data! There’s no ‘trivial’ level of reconciliation error above rounding issues, because if something is missing you cannot know how big the missing thing is. And even rounding issues should start to raise alarm bells if there’s more than a handful of them, given most modern finance systems correct for rounding differences automatically.

If you miss something in a data quality check, you cannot rely on any subsequent testing. Get your data quality checks right though, and you can have confidence that the outcomes of your later testing are on firm foundations. It’s all about producing good quality, reliable, evidence – exactly as audits should be.

About the author

Franki Hackett, Head of Audit and Ethics, Engine B

Benefits of membership

Becoming a member

Pay fees and subscriptions

Member rewards

Support throughout your career

Digital learning materials via BibliU

Take a look at ICAEW training films

Volunteering roles

Advertise with ICAEW

"How to guides" for ACA students

Exam resources

Digital learning materials

My online training file

Student Insights

UK groups and societies

Worldwide support and services

Strengthening trust in the profession

Regulatory applications

Data extraction quality: getting it right every time

Matching checks

Reconciliation checks

Sense checks

About the author

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Add this page to your CPD activity

Step 1 of 3

Download recorded

Download not recorded

Add this page to your CPD activity

Step 2 of 3

Add activity to my record

Step 3 of 3

Activity added

An error has occurred
Please try again