Author: shotlefttodatascience
-
Regex basics
Regex comes up all the time in NLP, and it’s worth having an understanding of the basics. In recent times the quickest way to construct a regex is to go ‘Hey <favourite LLM>, make me a regex to do xyz‘ and yet it is unsatisfying not to understand the construction of the provided regex –…
-
Representing text by counting
Natural language processing algorithms work with numbers, not text. So how can we convert strings of text into numbers that are representative of the meaning of that text? Some of the simplest methods (which can be surprisingly effective for some applications) are those that count words in various ways. In this article I’ll unpack the…
-
Computational literary analysis
Introduction The inspiration for this project (which I completed in early 2022), was a call for applications to UC Berkeley that I came across on the topic of NLP for computational literary analysis and specifically how one might develop computational models for the plot of a novel. The brief suggests that the concept of ‘plot’…
-
Improving customer satisfaction using Bayesian networks
Specification Background A typical IT support environment is governed by service level agreements (SLA’s) that define expected levels of service in terms of a variety of metrics such as ‘minimum first response time’ or ‘maximum resolution time’ for each ticket logged. However, adherence to these basic standards does not necessarily result in customer satisfaction. An…
-
Predicting churn with PySpark
I decided to tackle the Expresso churn prediction challenge on the Zindi platform during the course of the Big data analysis module of my degree for a couple of reasons: The full project can be viewed in my Github repo: The Expresso brief According to Zindi “Expresso is an African telecommunications company that provides customers…
-
Using human-in-the-loop techniques
Many machine-learning tasks rely on the availability of a labelled dataset for training and tuning. But how do we go about evaluation when the dataset we have is not labelled? This is exactly the situation I found myself facing during my final MSc project. I chose to experiment with building a knowledge graph from news…
-
My thesis in a podcast
I knew that, once completed, people might ask me about the research I did for my thesis, or be interested to know more. At the same time, academic writing can be very dry so I was pondering more accessible ways to present the material. I came across Hannah Fry’s Google DeepMind podcast where she interviews…
-
Studying data science through the University of London
I came across the University of London’s MSc Data Science program towards the end of 2020. At the time my Dad was fighting off Covid – and because I had also been exposed, I was quarantined with him and my Mom for three weeks while we waited to see whether we might also have contracted…
-
Dealing with impostor syndrome
I’m afraid that I do not write this article from the standpoint of having cracked the problem! But I do have some thoughts, which I’m jotting down here – notes to my future self when impostor syndrome rears its ugly head, as it surely will! Impostor syndrome is a psychological phenomenon where, despite often overwhelming…
-
Can I get there from here?
This started out as an experimental journey… I had already transitioned from music to teaching to SAP consulting. Would it be possible for the next leg of the journey to take me into the world of data science? At the outset I was not at all sure, but in the spirit of taking a sho’t left, I made a start, just…
