There are multiple scenarios why you would have regular reports which you need transforming and or processing. As an auditor it could be receiving general ledger reports from clients which need to be fed into another system, in a finance team you might be dealing with reports which need to be analysed each month.
PowerQuery allows you to make data workflows within Excel enabling you to process these changes quickly.
In the video below I run through how to standardise the formatting of a general ledger from a popular accounting package. The example used isn’t too complicated so it’s a good introduction to the thought process I use to standardise the data.
Process overview
Working with a general ledger scenario
- Deal with any data points which are shown as headings in your report
- In the video these were the account code and account description fields. We want this data alongside the transaction rows so when we filter out everything else we retain that data about each transaction.
- Reduce data rows to only transaction rows
- This involves filtering out report headers and totals. This is typically done by identifying a field which has characteristics which are unique to the transactions rows. In the video several fields could be used. I have found that date columns are often a good starting point for this but do check your data and don’t make assumptions. In the video I had to follow this up with a second filter to remove opening balance figures.
- Create additional columns as necessary
- Sometimes this will be splitting out account code and description columns. It could be splitting out date and time columns (depending on your requirements). In the video we created a Net column from our Debit and Credit columns.
- Standardise column naming and column orders
- Assuming your data feeds into something else you’ll want to standardise the column names and the order.
Useful tips
When importing the data you want to minimise modifications in Excel before using PowerQuery as these will need to be performed manually every time you want to reuse your document. In theory anything you need to do to make your data easier to work with will likely be available in PowerQuery anyway.
I used a data format change on a date column to effectively remove the header rows and blank rows. Remember that filters don’t work well with errors in your column so remove the errors before applying a filter. If you have an error you can click into the cell to understand why it is an error. Don’t remove errors without understanding what is causing them first as you could accidently remove relevant data.
In the video we create a Net field using the general principle [Debit] – [Credit]. This only works if every row of Debit and Credit contains a number. If it contains a null or text string then it won’t work and will create errors. Two methods were discussed which can have their own advantages and disadvantages.
- Using the “coalesce” operator ?? which allows you to define the input whenever there is no data (either null or incorrect data types). Replacing with ([Debit] ?? 0) – ([Credit] ?? 0) will enable the Net field to be calculated. It can however cause values to be created in the Net field which don’t make sense, where you try to subtract field headers from each other or cases were you could have the word total in the debt field with a number in the credit field.
- Using the replace values function over the two fields to convert null to 0 enabling the original custom formula to work. This involves an extra step in setting up (albeit a very quick one to do) and you’ll get value errors on rows where you have field headings or totals for example
You can clean up your resulting data very quickly by combining your field order and field selection steps into one. Selecting the fields you want in order using CTRL clicking then right clicking for remove other columns will do both in a single step. Where you are trying to standardise the data structure you may have fields which aren’t present in every dataset and adding a final parameter to the end will simplify things: MissingField.UseNull. This will create columns of null data for each missing expected field, much easier than creating multiple custom fields to pad out your data into the correct format.
What else can you do
Load from file – You can load data directly from a linked file without having to copy and paste data into a tab first.
Load from folder – PowerQuery can load entire folders worth of files at once, even multiple tabs per file if you need to automate in that way.
Load from a database directly – potentially the best automation option is to hook in directly to the source system so that all manual downloading steps can be removed.
Reusable knowledge
PowerBI – If you want to use this popular visualisation tool you’ll need to learn now to navigate PowerQuery to get your data formatted and imported.
Dataflows in Microsoft Fabric – Dataflows are cloud-powered PowerQuery procedures enabling broader automation with Power Automate. Allowing you to reuse those data skills honed via Excel.
Archive and Knowledge Base
This archive of Excel Community content from the ION platform will allow you to read the content of the articles but the functionality on the pages is limited. The ION search box, tags and navigation buttons on the archived pages will not work. Pages will load more slowly than a live website. You may be able to follow links to other articles but if this does not work, please return to the archive search. You can also search our Knowledge Base for access to all articles, new and archived, organised by topic.