Category: expeditions
-
GCP BigQuery Scripting
BigQuery stored procedures are incredibly useful. Google defines them as being “a collection of statements that can be called from other queries or other stored procedures. A procedure can take input arguments and return values as output.” Lovely! So we can declare some variables and then feed them as input arguments to our stored procedure…
-
Data analysis & asking the right questions
In mid-2023 AI was becoming a hot topic! ChatGPT launched on 30 November 2022 and took (at least parts of) society by storm. And for the first time in my life I would walk into my local café and hear people chatting about AI, arguing about AI, marvelling at AI… As part of my MSc…
-
Using the Design Science Research framework with human-in-the-loop techniques
One of the most useful aspects that I felt came out of my MSc research was combining Design Science Research principles (often used in business application settings) with human-in-the-loop techniques (useful where no labelled data is available). By combining these techniques the following benefits (highlighted in green) were enabled: I have compiled a short summary…
-
Building a knowledge graph from news – final MSc project
Today I finally received my MSc certificate from the University of London 🎉. By way of celebration I thought I would make my final (unabridged) project publicly available. There are 3 main components: The full project repo, which contains all the code that was used to build a knowledge graph from 2081 articles published by…
-
Tensorflow classification of 475 bird species
For this project I followed the universal workflow of machine learning as described in Deep Learning with Python (1st edition) by François Chollet. It is a classic text which builds the student’s understanding of neural networks ‘brick by brick’ and was the first book that really gave me a good understanding of where to start with neural…
-
Computational literary analysis
Introduction The inspiration for this project (which I completed in early 2022), was a call for applications to UC Berkeley that I came across on the topic of NLP for computational literary analysis and specifically how one might develop computational models for the plot of a novel. The brief suggests that the concept of ‘plot’…
-
Improving customer satisfaction using Bayesian networks
Specification Background A typical IT support environment is governed by service level agreements (SLA’s) that define expected levels of service in terms of a variety of metrics such as ‘minimum first response time’ or ‘maximum resolution time’ for each ticket logged. However, adherence to these basic standards does not necessarily result in customer satisfaction. An…
-
Predicting churn with PySpark
I decided to tackle the Expresso churn prediction challenge on the Zindi platform during the course of the Big data analysis module of my degree for a couple of reasons: The full project can be viewed in my Github repo: The Expresso brief According to Zindi “Expresso is an African telecommunications company that provides customers…
-
Using human-in-the-loop techniques
Many machine-learning tasks rely on the availability of a labelled dataset for training and tuning. But how do we go about evaluation when the dataset we have is not labelled? This is exactly the situation I found myself facing during my final MSc project. I chose to experiment with building a knowledge graph from news…
-
My thesis in a podcast
I knew that, once completed, people might ask me about the research I did for my thesis, or be interested to know more. At the same time, academic writing can be very dry so I was pondering more accessible ways to present the material. I came across Hannah Fry’s Google DeepMind podcast where she interviews…
