Tag: ai

  • GCP BigQuery Scripting

    BigQuery stored procedures are incredibly useful. Google defines them as being “a collection of statements that can be called from other queries or other stored procedures. A procedure can take input arguments and return values as output.” Lovely! So we can declare some variables and then feed them as input arguments to our stored procedure…

  • Data analysis & asking the right questions

    In mid-2023 AI was becoming a hot topic! ChatGPT launched on 30 November 2022 and took (at least parts of) society by storm. And for the first time in my life I would walk into my local café and hear people chatting about AI, arguing about AI, marvelling at AI… As part of my MSc…

  • The dissonance between AI and learning

    Yesterday one of my colleagues showcased a very creative idea she helped her young son implement for the school fête. Instead of selling sweets or cakes or lemonade, she helped him setup a song generator: “step up and we’ll generate a song just for you!” The concept was deceptively simple: give a short prompt about…

  • Tensorflow classification of 475 bird species

    For this project I followed the universal workflow of machine learning as described in Deep Learning with Python (1st edition) by François Chollet. It is a classic text which builds the student’s understanding of neural networks ‘brick by brick’ and was the first book that really gave me a good understanding of where to start with neural…

  • Representing text by counting

    Natural language processing algorithms work with numbers, not text. So how can we convert strings of text into numbers that are representative of the meaning of that text? Some of the simplest methods (which can be surprisingly effective for some applications) are those that count words in various ways. In this article I’ll unpack the…

  • Improving customer satisfaction using Bayesian networks

    Specification Background A typical IT support environment is governed by service level agreements (SLA’s) that define expected levels of service in terms of a variety of metrics such as ‘minimum first response time’ or ‘maximum resolution time’ for each ticket logged. However, adherence to these basic standards does not necessarily result in customer satisfaction. An…

  • Predicting churn with PySpark

    I decided to tackle the Expresso churn prediction challenge on the Zindi platform during the course of the Big data analysis module of my degree for a couple of reasons: The full project can be viewed in my Github repo: The Expresso brief According to Zindi “Expresso is an African telecommunications company that provides customers…

  • Using human-in-the-loop techniques

    Many machine-learning tasks rely on the availability of a labelled dataset for training and tuning. But how do we go about evaluation when the dataset we have is not labelled? This is exactly the situation I found myself facing during my final MSc project. I chose to experiment with building a knowledge graph from news…

  • My thesis in a podcast

    I knew that, once completed, people might ask me about the research I did for my thesis, or be interested to know more. At the same time, academic writing can be very dry so I was pondering more accessible ways to present the material. I came across Hannah Fry’s Google DeepMind podcast where she interviews…