ICAEW.com works better with JavaScript enabled.
Exclusive

Automated text search solution

Author: Simon Hurst

Published: 25 Jan 2022

Exclusive content
Access to our exclusive resources is for specific groups of students, users, subscribers and members.
You might have thought that the Christmas pantomime word search was finally behind us, but no, it isn't. It also serves to demonstrate the versatility of Power Query and follows on from one of David's recent 'Tip of the Week' posts on using Power Query for word frequency analysis.

The quest involved searching through a pantomime script to try and find as many Excel function names as possible. A function could be spread across multiple words and no function names that incorporated full stops were included. The intention was that people would wile away an hour or two, examining the text closely in order to triumphantly locate over 70 different function names. Of course, given this is the Excel Community, I was reasonably confident that some members would take the shortcut of using Excel itself to automate the search process. Given that the pantomime included a shameless claim that Power Query was the answer to all Excel problems, it seems only right to use Power Query to solve the challenge.

Because function names could be split across words, the first task was to turn all of our separate paragraphs and words into a single stream of text omitting all punctuation. So, starting with our pantomime paragraphs pasted into an Excel Table, we use The Data Ribbon tab, Get & Transform Data group, From Table/Range (formerly: From Sheet, even more formerly: From Table/Range) command to load our paragraphs into the Power Query editor. Next, we need to merge all the separate paragraphs into a single 'cell'. One way to do this is to first change the Data Type of our Text column to Text, then use the Transform Ribbon tab, Table group, Transpose command to convert our single column 7, 7 row table into a single row, spread across seven columns. We can then select Column1 and hold down the Shift key while clicking on the Column7 heading in order to select all columns. We then just right-click in any column heading and choose Merge Columns to create a single stream of text. Although our text is now in a single 'cell', potential function names are still separated by spaces or punctuation characters:

There are several possible ways to remove these. Perhaps the simplest is to create multiple Replace steps by right-clicking the column heading and choosing Replace Values and then, in each step, specifying one of the characters to remove and replacing it with nothing:
Excel screenshot

This is certainly simple to do, but does create lots of additional steps. A more elegant alternative is to add a Custom Column using the Add Column Ribbon tab, Custom Column command. You then need to type in the formula:

= Text.Remove([Merged],{" ".."@"})

The first part of the formula is fairly obvious: Text.Remove removes text characters. We first specify the column that we want to remove the characters from: [Merged]. The second argument can be a single character that we want to remove, but it can also be a list of characters. We use {} to indicate that we are using a list and then create our list as being all the characters (in ASCII number order) from the space character to the @. To do this we put the first and last characters in double quotes and use .. to indicated that we want to include all the characters in between. To demonstrate, we've just created a blank query that generates the list:

Excel screenshot

Adding this list as the second argument in our Text.Remove formula will remove all the characters in the list from our text as required.

We are making the rather broad assumption that no function name that might occur in our script includes any of those characters. If we wanted to be more specific, we could create a custom list of characters. We use Enter Data from the New Query group of the Home Ribbon tab to create a new, ad-hoc, table. We can then type each character into a separate row or our single column. Finally, we select our column and choose Transform Ribbon tab, Any Column group, Convert to List:

Excel screenshot

We have given our newly-created table the name RemChar. Note that we can add to and edit the contents of this table by clicking on the gear icon to the right of the Source step. We could then use this list in place of {" ".."@"} in our Text.Remove formula:

= Text.Remove([Merged],RemChar)

Of course, we could instead create our list of characters as an Excel Table and read that into Power Query for use in the same way.

Having removed the unwanted characters using any of these methods, we then need to convert our text to UPPERCASE by right-clicking in our column heading and choosing Transform, UPPERCASE. Finally, we will extract our block of text as a value by right-clicking on the text itself and choosing Drill Down. This allows us to refer to the value in another query, just by using its name.

Now all we need to do is to check every single function name in Excel to see if it occurs anywhere within our text. Fortunately, Microsoft has included a list of all function names in alphabetical order on its website.

We can use the Get &Transform Data, From Web command and paste in our URL:

Excel screenshot
After clicking Ok, in the Navigator screen we can choose Table0 to access our list of functions and then choose Transform Data:
Excel screenshot

Next, we remove the description column, then replace the ' function' suffix with nothing to remove it. Strangely, at the time of writing, the alphabetical list includes the recent function LET() but not the slightly more recent LAMBDA() function. One way to address this is to use Enter Data again, but this time to create a table, rather than just a list. This table contains a single column and, for the moment, just a single row containing the word LAMBDA. We can then append this table to the table of alphabetical function names. As with our script text, we need to convert all our names to upper case. Now we just need to check each of our function name rows to see whether our script text contains that name. To do this, we add a Custom Column and enter the formula as:

= Text.Contains(#"Table2",[Function name])

We now have a column of TRUEs and FALSEs that we can filter just to show the functions that were found somewhere in our script text:

Excel community

We can load this to a Table in or Excel workbook.

Although this gets us pretty close to our list of function names, some manual intervention is still necessary to exclude function names that are only included as a part of another function name such as AVERAGE within DAVERAGE in accordance with the rules of the original challenge.

Join the Excel Community

Do you use Excel in your organisation? Are you using it to its maximum potential? Develop your skills and minimise spreadsheet risk with our Excel resources. Membership is open to everyone - non ICAEW members are also welcome to join.

Excel polaroid
Archive and Knowledge Base

This archive of Excel Community content from the ION platform will allow you to read the content of the articles but the functionality on the pages is limited. The ION search box, tags and navigation buttons on the archived pages will not work. Pages will load more slowly than a live website. You may be able to follow links to other articles but if this does not work, please return to the archive search. You can also search our Knowledge Base for access to all articles, new and archived, organised by topic.