Author: shotlefttodatascience

  • My #GirlDad Tribute

    With the recent death of Kobe Bryant and his daughter in a tragic helicopter accident, many have been reflecting on what it means to be a #GirlDad, and thinking about it, I realized that I wanted to pay tribute to my own dad who did a great job of raising his girl in a time…

  • Data structures for deep learning

    In 2020 I completed the Udacity Deep Learning Nanodegree, which focuses on implementing a variety of deep learning architectures using PyTorch. At the outset, it’s pretty fundamental to understand the data structures you’ll be encountering as inputs to and outputs from your neural network architecture. What I noticed was that plenty of the issues encountered…

  • KMeans clustering

    Clustering is a type of unsupervised learning. Us humans would think of it as ‘categorization’ perhaps. For example, if I gave you a bag of red, blue and white balls and asked you to sort them (without telling you how) you would probably naturally gravitate towards sorting them by colour as this would be the…

  • How to – Principal Component Analysis

    I always like to understand concepts well before I use them (which is good because it’s the right thing to do, but bad because it slows me down a lot!), so it was with great excitement that I came across Matt Brems’ article A One-Stop Shop for Principal Component Analysis recently. If you read this…

  • If you buy one book in 2020…

    …make it Grokking Deep Learning by Andrew Trask! This gem of a book breaks deep learning down to its smallest component parts and then builds up your understanding from there. It’s the equivalent of stripping your car down to nuts and bolts and then re-building it: at the end, you will know to a certainty…

  • Tutorial: BigQuery arrays and structs

    The first time I encountered the BigQuery export schema this year my heart sank: arrays and structs were not something covered in my SQL intro course! But having spent a few months extracting data like this I’ve come to appreciate the logic. These are all the ‘notes to self’ I wish I’d had at the…

  • Finding relationships between words

    I’ve spent the past couple of weeks exploring how to find relationships between words with the skip-gram word2vec model so I was pretty fired up to share some of what I’d learned! Here are some of the intuitions I covered… What task have I been working on? I have a large number of news articles,…

  • Avoiding for loops in Pandas

    There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results – and the lesson I keep learning is Don’t do it! Every time I’m tempted to write a for loop with Pandas data I find myself clock watching and cursing… 9 times out of 10 there…

  • Emerging from Data Science Intensive

    In September & October I was fortunate enough to attend the Data Science Intensive Program (DSI) in Cape Town. In a word: WOW! The program brought together 16 students from 7 African countries for 8 very intensive weeks with an ambitious goal: To ensure that anyone who completes the DSI is able to contribute significant…

  • Data science for good – Kaggle challenge

    This was the first big data project I tackled on my own. I picked Center for Policing Equity challenge on Kaggle for three reasons: I love maps and I love the idea that data scientists can significantly improve our world, in addition to improving the bottom lines of big corporates. And this is exactly the…