ICAEW.com works better with JavaScript enabled.
Exclusive

Leveraging the Companies House API using Python

Author:

Published: 25 Sep 2023

Exclusive content
Access to our exclusive resources is for specific groups of students, users, subscribers and members.
Application Programming Interfaces (APIs) offer significant opportunities to the profession to bring in third party data into existing systems and processes to enhance them and add additional value. They also provide scope for integration, extending current system functionality, or allowing the embedding of new systems and software in a much more presentable way in your existing ecosystem.

This is all information we have seen and been privy to in recent webinars and articles, and I want to build on that by providing a practical example of how you can leverage Python to connect to a familiar resource – Companies House.

The mechanics behind this can be used to connect with any API.

Assumptions

I am assuming that readers of this will have a basic understanding of Python. For those that don’t, there is a wealth of material out there, and for those that prefer something more structured, then courses such as the ICAEW Data Analytics Certificate really are a useful way to upskill.

To begin with, I recommend you have the following:

  • Code editor – I personally use VS Code but there are plenty of other good options available. While Python in Excel is coming soon, this is likely to have limited functionality for managing and debugging code, so a code editor will still be useful even if you ultimately intend to execute the Python within Excel.
  • A virtual environment set up – this helps manage projects by localising your python installation and packages to your specific project – see here for more information

Required packages

To begin with, lets install the following packages:

  • requests – used for making requests to the API
  • pandas – used for data analysis and manipulation
  • python-dotenv – used to load our environment variables (see below)

You can install all 3 packages at once with the following command in your python terminal from inside your local environment:

Leveraging the Companies House API using Python

Using pip also ensures any dependencies for these packages are installed automatically.

Companies House API

Before diving into the code, let’s look at how we get set up with the Companies House API so we can start using it to obtain our data.

This API is a RESTful API, where REST means Representational State Transfer, where you make requests to an endpoint to retrieve or modify representations of data using standard HTTP request methods.

In order to access the API, you must register a Companies House user account, and create an application to obtain an API key – details on how to do this are here.

When creating your application add an application name, a description and ensure you select the Live environment – this will allow you to make requests for actual company data live on the register.

When the application has been created, you can create a new key. Give your key a name and description and then ensure you select the REST option.

Your API key is unique to you and should be kept safe. It is good practice to store these in a .env file along with any other credentials – these files are stored locally and not uploaded to code repositories like GitHub. This allows you to have multiple files that may be specific to your environment, manage sensitive information consistently and maintain its security.

Leveraging the Companies House API using Python

Making our first request

Now that we have our API key and our working directory set-up, we can create our first request. This will be to obtain a company’s profile from the register. Create a file called main.py and add the following code, which I will explain below:
Leveraging the Companies House API using Python

Before looking at the output, lets run through what the above code is doing:

  • Line 1-4 imports our required packages
  • Line 6 loads our .env file with our API key in
  • Line 8 sets the api_key variable based on the value in our .env file under the COH_API_KEY key where our Companies House API key is stored
  • Line 10-15 creates a GET request to obtain the company information for the company with registration number 00000006, and passing in our API key into the Authorization header
  • Line 17 prints the response of our request in JSON format (see below)

The above shows an example of Basic HTTP authentication, where you send your API key, or sometimes username and password, in the header of your request to authenticate.

Our output from the response is in JSON (JavaScript Object Notation) format – the de facto method of transferring data in web applications:

Leveraging the Companies House API using Python
The format of the response is in key-value pairs, so if I wanted to access the company status based on the above, I would use the notation data[“company_status”] where data is the JSON response. You will note that the output from the print statement does not look like the screenshot above – this is because I have formatted the response using the JSON Formatter extension for VS Code.

API documentation

API documentation will often come in a standard format that adheres to the Open API specification, a standard for connecting to HTTP based APIs.

It’s crucial to read these when attempting to connect to any API, as they will give critical information for your application such as authentication, request parameters and response models. Let’s dig into one of the endpoints (i.e., one of the resources made available via the API) and how to read the documentation:
Leveraging the Companies House API using Python

This gives you everything you need to create a request to this endpoint:

  • How to construct the URL
  • The relevant HTTP method for the request
  • The meaning of the response status codes (more on this later)
  • The response model

The response model gives you the breakdown of the response you expect to see from a successful request. This can then be used to extract the relevant data you need from the JSON that is returned. In the above case, the response model can be seen here, including the structure of the response, and the meaning of the key-value pairs that are included.

Object-orientated programming (OOP)

Python supports OOP, and without going into too much detail, it allows us to build classes that can create objects. These classes contain properties and methods that are common to the objects we want to create.

For example, you will note from browsing the API documentation, that the base URL for the requests is the same, and that we need to pass our API key into every single request we make. We can therefore create a class that can easily access properties and methods that will help simplify and organise our code.

So, let’s set up a CompaniesHouseAPI class that will hold our common properties and methods for interacting with the API:

Leveraging the Companies House API using Python

The above produces the same output as before, and the code works as follows:

  • Line 9 defines the CompaniesHouseAPI class using the class keyword
  • Line 10 is our class constructor method which takes 2 arguments: self is a reference to the instance of the class, and api_key is the API key which we need to pass in when instantiating the class
  • Line 11-12 initialises the values of api_key and base_url which are then accessed throughout the class by using self.api_key and self.base_url respectively
  • Line 14 defines our get_request class method which takes the endpoint URL as an argument
  • Line 15 uses the endpoint argument to then build a full request URL using the base_url property initialised earlier
  • Line 17 then creates a GET request by passing in the full URL and authentication header
  • Line 19-20 is some basic error checking where we test to see if the request was successful by comparing the status_code to successful response codes, and if it fails, we raise an error (see the requests package documentation for details) – I strongly advise building error handling into your application, and this is a very basic example
  • Line 21 returns the response if successful
  • Line 24 creates a new CompaniesHouseAPI object denoted c and passes in our API key from our .env file
  • Line 25 prints the JSON response of a request to the company information endpoint

Whilst for this individual request there appears to be more code than earlier, we will soon find out why structuring it in this way helps when we start adding more methods to extract information from different endpoints.

Common directors

One example use case is looking for companies that have directors in common. To do this, we need to perform a few different steps:

  1. Obtain a list of directors for our entity
  2. Identify all companies they are listed as directors for
  3. Combine our results

We can identify our directors by using the Officers endpoint. So, let’s add a new endpoint called get_officers which will leverage our get_request method and leverage our new OOP-style set up:

Leveraging the Companies House API using Python

This method does the following all in 4 lines of code:

  • Line 28 defines the method and the company_number argument that should be passed in
  • Line 29 constructs the endpoint we wish to call – note that I exclude the base URL, which is used in the get_request method and initialised in our class constructor
  • Line 30 returns the JSON response of our request

Running the code will produce a response defined by the response model here. Delving into the response, we see that there is a difference between the number of officers in the response (total_results) and the items shown (items_per_page).

Re-reading the endpoint information, we can see that there are in fact query parameters (these appear in the URL itself) that can be passed in to specify both the items_per_page and start_index to essentially navigate through the pages of the response.

In my experience, a lot of APIs will use pagination and allow you to set the maximum number of results you wish to return in a single request and will often have either an indication of the page of the results, or a link to the next page that can be subsequently called for a complete response.

So to deal with this, I am going to amend my get_requests method and build in a check for pagination, and pass relevant values for start_index into the request by, once again using the requests package documentation.

The result is this:

Leveraging the Companies House API using Python

I have added a new build_api_request method which then feeds into my get_request method, from which I check and deal with pagination. The new elements of the code operate as follows:

  • Line 16-18 defines the new build_api_request method, which takes the following arguments:
    o http_method: i.e. GET, POST, PUT etc
    o endpoint: the URL endpoint used in the request
    o **kwargs: which stands for keyword arguments, and allows me to specify named arguments, which in this case I pass into the request method of the requests class – this allows me to specify additional arguments such as params (used here) as and when needed without having to refactor my build_api_request method with additional arguments
  • Line 20 sets the headers argument to contain our API key – this ensures that it is included and is correctly set to the environment variable defined earlier
  • Line 22 uses the request method of the requests package to build a request using the specified HTTP method provided
  • Line 30 creates an initial request using the build_api_request method and obtains the JSON representation of the response
  • Line 33 checks to see whether the items_per_page key is present in the response, as this indicates there is pagination present
  • Line 35-37 stores the items_per_page, start_index and total_results from the initial response
  • Line 39 creates a loop, where we check that the start position of the next page, we are going to request does not exceed the total number of results available, as if it does it would result in an invalid request
  • Line 40 then sets our start_index to be the first position available on the next page, note that the results are zero-indexed
  • Line 42-44 then creates a new request passing in the updated start_index argument to the params argument of our request
  • Line 45 then adds our results together for a complete list

    The get_officers method remains unchanged except for returning the JSON object now, and running this will now return the entire results – this can be verified by using the len() function on the result which will return the number of elements in the array, or in this context, the number of officers associated with the company.

Manipulating JSON using Pandas

Now to perform the first stage of the use case of identifying Common Directors, then we actually need to filter our response from get_officers to show active directors only. To do this, I’m first going to get the data into a Pandas DataFrame, and use the Pandas API to easily filter and manipulate the data like so:
Leveraging the Companies House API using Python

The above leverages the get_officers method and does the following:

  • Line 56 converts our JSON response into a DataFrame – the json_normalize method is very powerful, but sometimes with nested lists and dictionaries in responses, it isn’t always as simple as above – I highly recommend reading this article for how to handle these scenarios
  • Line 57-63 defines the column names I want to keep based off the response model – note the 4th element – this is a result of the flattening that occurs in the json_normalize method with default arguments, and you will see in the response model definition that the links key represents nested dictionaries
  • Line 64-66 drops the other columns by using Sets to get the difference between the columns present and those I wish to keep
  • Line 68-71 returns the DataFrame filtered for active directors only, and removing the resigned_on column as it is no longer required

The results of this method will now provide a lovely DataFrame like so:

Leveraging the Companies House API using Python
We now have part 1 of our use case working. We now want to identify other companies that the directors are also directors for. This is where our links.officer.appointments column comes in handy, as this provides a direct link to the appointments for that director, which could be passed directly into our get_request method.

Next steps

This article was written just to show you a practical example of how to create reusable code for the purposes of connecting to an API and covered the following:

  • Secure storage of your API keys
  • Leveraging the requests package
  • Basic authentication
  • The importance of API documentation
  • Using Python classes and class constructors
  • Dealing with pagination
  • Parsing data into Pandas DataFrames

There are plenty of ways to approach the example in this article, and my method is just one of many that works and feels understandable.

There are also plenty of things I haven’t taken into consideration, e.g., rate limiting that is very common on APIs and needs to be managed.

I have created a GitHub repository with this code that can be found here. If you found this article interesting and would like to see a build of the full use case, or even some other use cases, then please reach out (contact details below) and I might write a new one and update the repository.

Either way, feel free to use the code in the repo and build out your own use cases.

All of the above techniques can be applied to any API, although authentication methods can vary. Using these you can look to bring other data sources into your workflows or integrate other systems and software where providers offer this.

For example, at Circit we offer our APIs to our audit customers so they can automate bank confirmation request workflows, or easily extract bank transaction data from their customers into their testing using our Open Banking powered Verified Transactions module.

Further tips and tricks

For those that have lasted this long, firstly, congratulations! And, secondly, a few things that I generally use, or look at when writing my code:

  • Docstrings – really good way of documenting what a method does, the arguments used etc – there is a brilliant VS Code extension called autoDocstring that I generally use – I have actually added these into the repo
  • Formatter – I tend to always use and configure a code formatter to make it more readable, my preferred is black for Python and it also integrates with VS Code seamlessly too
  • Comment, comment and comment your code – having been part of a team that code share and just even trying to understand code you’ve not seen before, get into the habit of commenting
  • Error handling and logging is crucial – build it in from the beginning

About the author

Sam is a Chartered Accountant and Auditor, but with a keen interest in technology. He has previously managed an audit portfolio whilst leading an Innovations team building internal analytical tools for audit and providing automation and analytics to clients. He now works at Circit as a Product Manager overseeing Circit’s Verified Transactions, Verified Insights and Verified Analytics modules.

Reach out to Sam at sam.bonser@circit.io.

Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250
Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250