Welcome to our new Excel Tips and Tricks! Building on our ‘Tip of the Week’ series we’ll continue to bring you the latest and greatest Excel functionality you didn’t know existed or weren’t sure how to use. We’ll continue to index these posts so you can find them alongside the Tip of the Week archive as a one-stop-shop for all your Excel trickery.
For our first article we’ve got a bit of an Easter special… Given Microsoft decided to ditch ‘Easter Eggs’ in its software some years ago, Easter-themed blog content is rather slim pickings these days. So, if you’ll pardon the tenuous link, we’re going to go Developer level to talk a little about Excel file types and explore the inner workings of an XLSX file – under the (Easter) bonnet so to speak.
Be warned: one wrong move could irretrievably break your Excel file so take a copy of any file you try this on!
XLS vs XLSX vs XLSM
Back in the olden days (by which we mean, pre-Excel 2007), there was really only one Excel file type – XLS – a binary file format developed by Microsoft in the late 1980s.
While it worked very well, it had many limitations (not least the row limit of 65,536). So with Office 2007, Microsoft introduced new file formats across the entire suite, developing the ‘Office Open XML’ format. It brought a number of advantages, including smaller file sizes that could handle larger volumes of information, and by being an ‘open’ format, it became possible for other software to interact with Office documents in a way that had never previously been possible. XLSX was born – alongside DOCX, PPTX and others.
Except, it isn’t quite that simple. Microsoft wanted to more clearly distinguish files that contained macros, which gives us the offshoot file types ending in ‘M’. The difference between XLSX and XLSM files is just that the former cannot have any macros embedded in them, whereas the latter might have macros. This is an important consideration from a security perspective, given most malicious Excel files will be macro-enabled.
But what’s XLSB?
OK, there’s another file type to mention briefly – the Excel Binary type. This is in essence a modern version of the original ‘XLS’ file type – files are small, fast and can typically handle much larger data volumes than XLSX, but as they don’t use the XML format are less well supported by proprietary tools, and like the old XLS files there is no distinction between macro and non-macro XLSB files. We won’t spend any more time on XLSB files, other than to say their internal structure is identical to XLSX and XLSM bar the format of their contents. What’s inside an XLSX or XLSM file?
What’s inside an XLSX or XLSM file?
This is the fun part – XLSX and XLSM files are really ZIP files. Literally. As you can see here, you can change the file extension from XLSX to ZIP, and open it like a ZIP file:
In this example XLSX file, we have two worksheets, one of which has the TODAY() function in it:
When you change to a ZIP file extension, you can still force Excel to open it, but I wouldn’t recommend doing this. Far more interesting is to explore it as a ZIP file.
So what’s in it? Well, mostly, XML files. There’s typically 1 file and 3 folders:
- _rels is the ‘Relationships’ folder, which contains the definition of the package i.e. the XML schema
- docProps are the document properties – the file owner, creation and last modified date, what version of Excel the file was saved in and some other details…
- xl is where the actual content is stored which we’ll dig into in a moment
- [Content Types].xml is basically an index of the XML files in the ZIP package
- You’ll also see in the GIF above an extra ‘customXml’ folder – this is where some additional content is stored to support other applications. In my case this is substantially related to the Office 365 SharePoint integration.
Where is my data?
Everything in the cells and sheets can be found in the xl folder, and the structure of this folder will depend on what features you have in your workbook. We haven’t got time to go into every permutation here today, but the key pointers are:
- Each worksheet has a separate XML file in the ‘worksheets’ folder, by sheet ID. You will see details of formulas here, showing the cell reference, formula and current saved value:
However, note that the text on our other worksheet isn’t stored in the same way:
These are references to the ‘sharedStrings’ XML which allows the workbook to store string data only once, making it space efficient when the same text strings are used multiple times in a workbook.
- The theme folder stores the formatting themes in relation to the workbook (fonts, colours etc.)
- The workbook XML file is the lookup between sheet IDs and sheet names
- Beyond this, you will find additional folders and XML files for pivot tables, charts, tables, drawings, data models, queries/connections, slicers/timelines and any cached data (e.g. data that is part of a data model but not found in the worksheets)
What about Macros?
XLSM files are essentially identical, but the xl folder contains an additional file “vbaProject.bin”. These are binary files and, like XLSB content, aren’t really designed to be opened outside of Excel, but it is possible to open in a text editor and view the relevant code:
(Details of the button itself are stored in the drawings folder)
Why it Matters - File Security
Understanding the inner workings of XLSX files comes to the fore with file security. There are five main levels on which you can apply security to a workbook:
- Protecting the file for opening (through save options)
- Protecting the file for modifying (through save options)
- Protecting the Sheet (through the ‘Review’ tab)
- Protecting the Workbook (through the ‘Review’ tab)
- Protecting the Macro for viewing (through the VB Editor)
The only level at which setting a password makes it impossible to access/edit the data within the workbook via the XML files is the first, as this encrypts the entire workbook, and the file cannot be opened as a ZIP. All other protections have no impact on the accessibility of the file contents when treated as a ZIP, other than the storing of hashed password details within XML files.
Therefore, if you have an Excel file with sensitive data stored in it, the only way to truly restrict access to it is to password protect the entire file or set access restrictions at the folder level. This is another good reason why Excel’s built-in protections should not be considered a robust solution to workbook security.
XLSX, XLSM (and XLSB) files are perhaps the Easter Eggs we lost when Microsoft stopped putting secret mini-games in Excel – they’re a veritable treasure trove of XML content and it is possible to get lost for hours digging around in them and playing the “if I do this in Excel, what happens in the XML” game. But behind all this is an important point – the only way to really make your spreadsheet secure, is to stop people accessing it completely!