Author: shotlefttodatascience
-
GCP BigQuery Scripting
BigQuery stored procedures are incredibly useful. Google defines them as being “a collection of statements that can be called from other queries or other stored procedures. A procedure can take input arguments and return values as output.” Lovely! So we can declare some variables and then feed them as input arguments to our stored procedure…
-
Data analysis & asking the right questions
In mid-2023 AI was becoming a hot topic! ChatGPT launched on 30 November 2022 and took (at least parts of) society by storm. And for the first time in my life I would walk into my local café and hear people chatting about AI, arguing about AI, marvelling at AI… As part of my MSc…
-
Using the Design Science Research framework with human-in-the-loop techniques
One of the most useful aspects that I felt came out of my MSc research was combining Design Science Research principles (often used in business application settings) with human-in-the-loop techniques (useful where no labelled data is available). By combining these techniques the following benefits (highlighted in green) were enabled: I have compiled a short summary…
-
Building a knowledge graph from news – final MSc project
Today I finally received my MSc certificate from the University of London 🎉. By way of celebration I thought I would make my final (unabridged) project publicly available. There are 3 main components: The full project repo, which contains all the code that was used to build a knowledge graph from 2081 articles published by…
-
Systems design concepts
Ashish Pratap Singh‘s article “System Design was HARD until I Learned these 30 Concepts” is just so well-written and intuitive that I simply had to save it as a postcard for my future self. Some of these concepts I’ve come across directly through data science, others I have absorbed almost by osmosis over the years…
-
The dissonance between AI and learning
Yesterday one of my colleagues showcased a very creative idea she helped her young son implement for the school fête. Instead of selling sweets or cakes or lemonade, she helped him setup a song generator: “step up and we’ll generate a song just for you!” The concept was deceptively simple: give a short prompt about…
-
Limit theorems explained
Before we dive into the theorems let’s tackle a concept one often sees in statistics: the notion of independent, identically distributed (iid) random variables. Whether we’re drawing a sample from a population or conducting a series of experiments like coin flips, we can assess whether iid holds true or not as follows: Independent? Here we…
-
Tensorflow classification of 475 bird species
For this project I followed the universal workflow of machine learning as described in Deep Learning with Python (1st edition) by François Chollet. It is a classic text which builds the student’s understanding of neural networks ‘brick by brick’ and was the first book that really gave me a good understanding of where to start with neural…
-
Regex basics
Regex comes up all the time in NLP, and it’s worth having an understanding of the basics. In recent times the quickest way to construct a regex is to go ‘Hey <favourite LLM>, make me a regex to do xyz‘ and yet it is unsatisfying not to understand the construction of the provided regex –…
-
Representing text by counting
Natural language processing algorithms work with numbers, not text. So how can we convert strings of text into numbers that are representative of the meaning of that text? Some of the simplest methods (which can be surprisingly effective for some applications) are those that count words in various ways. In this article I’ll unpack the…
