An Excel-related conundrum
Yesterday was a perfect example of how you don’t really know anything until you start trying to apply your knowledge. My day job has gazillions of occasions where I vlookup thing1 against thing2 to join spreadsheet data so I’ve been trying to do this in pandas rather than Excel, just for practice… all good so far, I did several examples that worked like a bomb: until yesterday! So here is the error I encountered and the solution – TRIVIAL but ANNOYING!
import pandas as pd # I import a file containing my data as a dataframe: div1 = pd.read_excel("Division import1.xlsx")
On the left, the data in Excel, on the right the data in Python – all well you think, but notice the warning about the 2 numeric values in the Division column in Excel:
# Then I import a file containing my lookup values: names = pd.read_excel("Division name import.xlsx", index_col = "Division", squeeze = True) # Then I do the mapping: div1["Division"].map(names)
And I get these pesky NaN values!
I’m afraid being a newbie I spent over an hour trying to work out what the issue was! There may be a way to detect this once its in Pandas (I haven’t figured that one out yet – tips welcome :)), but once I fixed my source data in Excel (by using General formatting instead of Text formatting) then everything went smoothly from there.