Author: shotlefttodatascience

  • The art and science of planning A/B tests

    A/B testing is a popular technique for comparing two versions of a feature (A and B) to assess which will be most successful. It is widely used in the tech industry to provide a quantitative basis for decision-making, for example: This article will consider the following hypothetical scenario: Your stakeholder wants to test a new…

  • A buffet of LLM themes

    In the past couple of years there have been a plethora of developments as researchers improved the base performance of large language models (LLMs), found novel methods to fine-tune and enhance them for specific purposes, and came up with new ways to build them into workflows. Keeping up with all the concepts and jargon can…

  • GCP BigQuery Scripting

    BigQuery stored procedures are incredibly useful. Google defines them as being “a collection of statements that can be called from other queries or other stored procedures. A procedure can take input arguments and return values as output.” Lovely! So we can declare some variables and then feed them as input arguments to our stored procedure…

  • Data analysis & asking the right questions

    In mid-2023 AI was becoming a hot topic! ChatGPT launched on 30 November 2022 and took (at least parts of) society by storm. And for the first time in my life I would walk into my local café and hear people chatting about AI, arguing about AI, marvelling at AI… As part of my MSc…

  • Using the Design Science Research framework with human-in-the-loop techniques

    One of the most useful aspects that I felt came out of my MSc research was combining Design Science Research principles (often used in business application settings) with human-in-the-loop techniques (useful where no labelled data is available). By combining these techniques the following benefits (highlighted in green) were enabled: I have compiled a short summary…

  • Building a knowledge graph from news – final MSc project

    Today I finally received my MSc certificate from the University of London 🎉. By way of celebration I thought I would make my final (unabridged) project publicly available. There are 3 main components: The full project repo, which contains all the code that was used to build a knowledge graph from 2081 articles published by…

  • Systems design concepts

    Ashish Pratap Singh‘s article “System Design was HARD until I Learned these 30 Concepts” is just so well-written and intuitive that I simply had to save it as a postcard for my future self. Some of these concepts I’ve come across directly through data science, others I have absorbed almost by osmosis over the years…

  • The dissonance between AI and learning

    Yesterday one of my colleagues showcased a very creative idea she helped her young son implement for the school fête. Instead of selling sweets or cakes or lemonade, she helped him setup a song generator: “step up and we’ll generate a song just for you!” The concept was deceptively simple: give a short prompt about…

  • Limit theorems explained

    Before we dive into the theorems let’s tackle a concept one often sees in statistics: the notion of independent, identically distributed (iid) random variables. Whether we’re drawing a sample from a population or conducting a series of experiments like coin flips, we can assess whether iid holds true or not as follows: Independent? Here we…

  • Tensorflow classification of 475 bird species

    For this project I followed the universal workflow of machine learning as described in Deep Learning with Python (1st edition) by François Chollet. It is a classic text which builds the student’s understanding of neural networks ‘brick by brick’ and was the first book that really gave me a good understanding of where to start with neural…