Tag: python
-
Regex basics
Regex comes up all the time in NLP, and it’s worth having an understanding of the basics. In recent times the quickest way to construct a regex is to go ‘Hey <favourite LLM>, make me a regex to do xyz‘ and yet it is unsatisfying not to understand the construction of the provided regex –…
-
Computational literary analysis
Introduction The inspiration for this project (which I completed in early 2022), was a call for applications to UC Berkeley that I came across on the topic of NLP for computational literary analysis and specifically how one might develop computational models for the plot of a novel. The brief suggests that the concept of ‘plot’…
-
Predicting churn with PySpark
I decided to tackle the Expresso churn prediction challenge on the Zindi platform during the course of the Big data analysis module of my degree for a couple of reasons: The full project can be viewed in my Github repo: The Expresso brief According to Zindi “Expresso is an African telecommunications company that provides customers…
-
Data structures for deep learning
In 2020 I completed the Udacity Deep Learning Nanodegree, which focuses on implementing a variety of deep learning architectures using PyTorch. At the outset, it’s pretty fundamental to understand the data structures you’ll be encountering as inputs to and outputs from your neural network architecture. What I noticed was that plenty of the issues encountered…
-
KMeans clustering
Clustering is a type of unsupervised learning. Us humans would think of it as ‘categorization’ perhaps. For example, if I gave you a bag of red, blue and white balls and asked you to sort them (without telling you how) you would probably naturally gravitate towards sorting them by colour as this would be the…
-
How to – Principal Component Analysis
I always like to understand concepts well before I use them (which is good because it’s the right thing to do, but bad because it slows me down a lot!), so it was with great excitement that I came across Matt Brems’ article A One-Stop Shop for Principal Component Analysis recently. If you read this…
-
Regex is a thing worth knowing :)
Regex can do amazing things with data cleanups – basically mandatory must use. But also tricky to retain in brain if not used frequently… Here are 3 great reference and test resources that can help: https://docs.python.org/3/howto/regex.html https://regexr.com/ https://regexone.com/
-
Getting results vs Understanding
Alexander Pope is famously quoted as saying: A little learning is a dangerous thing; drink deep, or taste not the Pierian spring: there shallow draughts intoxicate the brain, and drinking largely sobers us again. I’ve been thinking about these words the past few days as I worked on my latest challenge: a text classifier using…
-
Learn Python Challenge on Kaggle
I signed up for this 7-day challenge to test my knowledge, and it’s been an absolute delight! As a newbie, when I find myself on StackOverflow reading discussions about “the most Pythonic way” to do something, I usually feel a bit left out… I’ll just be happy if I can do it any darned way…
-
Small simple datasets for practising
It’s all very well downloading complex datasets from Kaggle and similar sources to play with – they’re amazing for learners because the data is always less clean than you would have hoped, more complex than you anticipated, and every bit as interesting as promised. BUT if you’re learning a new concept it’s easier to have…
