companies_house_example

Using Raphtory to find fishy behaviour on Companies House ๐ŸŽฃ

Rachel Chan ยท April 25, 2023



In the UK, requirements to register a new company are few. Practically anyone over the age of 16 years of age can own and manage a UK limited company. It only takes a few minutes to register a new company on Companies House. This has brought about a rise in serial company formations and dissolutions by individuals.

One example is a 92 year old woman named Barbara Kahan who has 22777 company appointments to her name, all of which are filed with an inconspicuous London address - 2 Woodberry Grove, London, N12 0DR. Little would one know that this leafy, suburban building was being used to register shell companies involved in fraud, money laundering and political corruption.

With company formation agents advertising their services for as little as the price of two Mcdonaldโ€™s Big Macs, the 3 hour company formation service has become an open door to criminal activity. Amidst the influx of new companies registering in the UK, it is hardly surprising that companies being formed for criminal purposes have gone unnoticed. More needs to be done to tackle this issue in the UK.


Fortunately, Raphtory can be used to capture the bad eggs amongst the hundreds and thousands of companies registering on UK land. Raphtory is a powerful analytics tool for large-scale graph analysis. With Raphtory, it takes a few seconds to turn Companies House data into insights on fishy behaviour going on with companies.

In this blog, we will scrape information about all the companies that Barbara Kahan has been a director for and use Raphtory to analyse this data. Follow along with your own Python notebook of choice, as we unveil the dark secrets lying within UKโ€™s company registry.

Follow along with our Jupyter Notebook

We have uploaded the full Jupyter notebook for this tutorial blog on our Github which you can find by clicking here. Feel free to pull this example from Github or write up a fresh notebook in your local machine.

How to collect Companies House Data

We are in luck as Companies House have provided a REST API. At Pometry, we have built several crawlers that scrape the Companies House website, giving us direct access to the data we want. Currently, we have 3 crawlers: one made specifically to scrape Barbara Kahanโ€™s companies for this blog post and tutorial, another for grabbing Persons with Significant Control information and the last for grabbing Company Director information. All our crawlers output JSON data, ready to be loaded into a Raphtory graph for analysis. We have made this public via pip install and explain how to use it below.

How to use the Companies House crawler

Getting your Companies House API key

Before scraping the Companies House website, you will need to create an account on the Companies House Developer Hub:

After logging into your account, create an application where your API keys will be stored:

Once created, go into your application and create a new REST API key. This key will be used to authenticate your scrape requests:



Make sure you select REST when creating your application:



Copy your API key which will be used to scrape Companies House website:



You are now ready to install the crawler and start scraping the Companies House website.

Installing and running our Companies House crawler

Install the crawler using pip:

pip install -i https://test.pypi.org/simple/ cohospider


Go into a Python terminal and run the following code:

from spiders import BarbaraSpiderRun


runner = BarbaraSpiderRun(key="YOUR API KEY HERE")


runner.start()


Our crawler will start to scrape the Companies House API, finding all of Barbaraโ€™s company data. Once finished, all your data can be found in the data/aqWJlHS4_rJSJ7rLgTK49iO4gAg folder in your root directory. We can now start the analysis using Raphtory.

Analysing the data with Raphtory

Install Raphtory via pip:

pip install raphtory


Open a Python Terminal of your choice. We use Jupyter Notebook for this example. Import all the dependencies needed for this example:


We use the Python JSON library to parse the JSON files outputted from the crawler. Through this, we can create a Raphtory graph and add our values to the graph via the add_edge() function.


Enter the directory path to your json files inside the path_to_json variable. It should look something like this: ~/companies_house_scraper/tutorial/data/aqWJlHS4_rJSJ7rLgTK49iO4gAg:


Iterate through all the JSON files (there are many files since the crawler works by crawling page by page) and add values to your Raphtory graph via add_edge() function:

Quick overview of Barbaraโ€™s companies using Raphtory

With the Raphtory API, we can quickly find statistics from our data about Barbaraโ€™s company ownership history.

Create a list of director names to see how many different names the director goes by:


Finding the number of companies formed by the director:


Seeing the earliest and latest company formations this director has made:


There are a plethora of methods in the Raphtory API that give you an overview about your graph data. These are just a few to demonstrate how easy it is to access this information with Raphtory.

Using Raphtory properties to filter suspicious companies

The date that the director resigned from the company can be accessed via the edge property. This is the API for adding properties to edges in Raphtory:

g.add_edge(time, source, target, {'property_name': property_value})

It is possible to have an infinite number of properties on edges and vertices in Raphtory to store extra information. However, we have kept it simple for this example. Properties in Raphtory have enabled us to store the resignation date in two formats - date time format and epoch timestamp format.


It would be unusual if a Company Formation Agent helped their client set up a company and stayed on as director, as opposed to immediately handing the director title to the client. This can be indication of a criminal using the company formation agent as a front for their dishonest activities.





As you can see from the above code snippet, Barbara had stayed on at 859 companies for longer than one year. Letโ€™s delve deeper into when these companies were created and exactly who these companies belonged to.

Create a line plot visualisation over time with Raphtory

We can use a function in Raphtory called .rolling() with a window size of 10000000000 milliseconds (around 4 months). This enables us to โ€œrollโ€ through all the windows/views, counting the number of companies the director was assigned to over time.

Roll through the graph with a window of 10000000000 milliseconds:


For each view, count the number of edges:


Create the line plot visualisation with the Seaborn library:

Now that we can see several spikes in our graph, especially between 2012 and 2016, we can further investigate this window of time.

Using windows to filter particular timepoints of interest

One of the spikes in the line plot above is at year 2014. To investigate this further we use the .window() function which takes a start and end time. We will look at a window of 01-01-2014 to 01-01-2015.

There seems to be a spike between 2014-02-01 to 2014-02-15. We create a window for this spike to investigate further.

There is a big spike on 2014-02-11 and 2014-02-12. Now that we have specific time points, we can find out the names of the companies in these spikes by visualising the edges of the graph at these time points.

Dynamic visualisation of your graph in Raphtory

To visualise specific dates, we first create a window which includes the time point we want. Below, we have created a window that only includes the date 2014-02-12. We then filter for edges where the company formation agent (Barbara) has stayed as director at the company for longer than a year. Lastly, we use Raphtoryโ€™s .to_pyvis() function to create a dynamic visualisation of the edges. In this way, we can clearly see the companies where the company formation agent has stayed as director for longer than a year.


The visualisation will appear in a file called nx.html which can be opened in a web browser.

A screenshot of the dynamic visualisation
If you would like your graph in a list of vertices and edges, you can call methods such as .vertices() and .edges().



In just a few minutes, we have transformed billions of company data into interesting insights on potential fishy behaviour.

Temporally analysing data has been difficult in the past, however Raphtory makes it incredibly easy. Rather than looking at data in a static manner, taking into account of time can give us better understanding on our data that may be beneficial to your unique use cases.

If you would like to run further analysis and algorithms at scale in a production environment, drop the team at Pometry a message, and they will be more than happy to help.



Twitter, Facebook