Sho't left to data science

NaN values in Pandas .map()

Jan 12, 2018

—

An Excel-related conundrum

Yesterday was a perfect example of how you don’t really know anything until you start trying to apply your knowledge. My day job has gazillions of occasions where I vlookup thing1 against thing2 to join spreadsheet data so I’ve been trying to do this in pandas rather than Excel, just for practice… all good so far, I did several examples that worked like a bomb: until yesterday! So here is the error I encountered and the solution – TRIVIAL but ANNOYING!

import pandas as pd
# I import a file containing my data as a dataframe:
div1 = pd.read_excel("Division import1.xlsx")

On the left, the data in Excel, on the right the data in Python – all well you think, but notice the warning about the 2 numeric values in the Division column in Excel:

# Then I import a file containing my lookup values:
names = pd.read_excel("Division name import.xlsx", index_col = "Division", squeeze = True)

# Then I do the mapping:
div1["Division"].map(names)

And I get these pesky NaN values!

I’m afraid being a newbie I spent over an hour trying to work out what the issue was! There may be a way to detect this once its in Pandas (I haven’t figured that one out yet – tips welcome :)), but once I fixed my source data in Excel (by using General formatting instead of Text formatting) then everything went smoothly from there.