python – Sho't left to data science

AnalyticsVidhya ♥︎

May 25, 2020

—

by

There are very few mailing lists that I remain subscribed to AND faithfully open on a regular basis. But Analytics Vidhya always has something interesting, something complicated explained simply, a step-by-step tutorial… Take, for example, the article I received this morning 10 matplotlib tricks – I so totally need this – it seems like a…

Visualizing data overlaps

May 19, 2020

—

by

shotlefttodatascience

in python

An example use case is this: you have a list of customers who have bought the various products that you sell. You want to know where the overlaps are, for example: How many customers who bought the Blue Widget also bought the Green Widget? Or what percentage of customers who bought the Blue Widget also…

KMeans clustering

Jan 3, 2020

—

by

shotlefttodatascience

in machine learning, python

Clustering is a type of unsupervised learning. Us humans would think of it as ‘categorization’ perhaps. For example, if I gave you a bag of red, blue and white balls and asked you to sort them (without telling you how) you would probably naturally gravitate towards sorting them by colour as this would be the…

How to – Principal Component Analysis

Jan 3, 2020

—

by

shotlefttodatascience

in data preparation, machine learning, python

I always like to understand concepts well before I use them (which is good because it’s the right thing to do, but bad because it slows me down a lot!), so it was with great excitement that I came across Matt Brems’ article A One-Stop Shop for Principal Component Analysis recently. If you read this…

Avoiding for loops in Pandas

Dec 30, 2018

—

by

shotlefttodatascience

in python

There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results – and the lesson I keep learning is Don’t do it! Every time I’m tempted to write a for loop with Pandas data I find myself clock watching and cursing… 9 times out of 10 there…

Pandas dataframe styling – cool!

Aug 23, 2018

—

by

shotlefttodatascience

in python

I always like to visualize data and see the detail if possible so it was with great joy that I stumbled across DataFrame.style this morning. Here is an example of how it helps us to visualize some Titanic survival rates by sex and passenger class: The Pandas documentation itself is pretty comprehensive, but if you’re looking…

Regex is a thing worth knowing :)

Aug 19, 2018

—

by

shotlefttodatascience

in python, tools

Regex can do amazing things with data cleanups – basically mandatory must use. But also tricky to retain in brain if not used frequently… Here are 3 great reference and test resources that can help: https://docs.python.org/3/howto/regex.html https://regexr.com/ https://regexone.com/

Getting results vs Understanding

Jul 3, 2018

—

by

shotlefttodatascience

in machine learning, python

Alexander Pope is famously quoted as saying: A little learning is a dangerous thing; drink deep, or taste not the Pierian spring: there shallow draughts intoxicate the brain, and drinking largely sobers us again. I’ve been thinking about these words the past few days as I worked on my latest challenge: a text classifier using…

Learn Python Challenge on Kaggle

Jun 20, 2018

—

by

shotlefttodatascience

in python

I signed up for this 7-day challenge to test my knowledge, and it’s been an absolute delight! As a newbie, when I find myself on StackOverflow reading discussions about “the most Pythonic way” to do something, I usually feel a bit left out… I’ll just be happy if I can do it any darned way…

My magician’s wand works!

May 25, 2018

—

by

shotlefttodatascience

in python

This week I’m literally feeling like a magician! My first real classifier attempt: with a month’s worth of emails to the Service Desk, and sklearn.naive_bayes ,I can tell to a 96% certainty which incidents should be assigned to Team A and which to Team B. MAGIC!

Category: python