There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results – and the lesson I keep learning is Don’t do it!
Every time I’m tempted to write a for loop with Pandas data I find myself clock watching and cursing… 9 times out of 10 there is another way, and here are 2 of my favourite recipes, enjoy :).
Some sample data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df = pd.DataFrame({'Type' : ['click', 'buy', 'click', 'buy', | |
'click', 'buy', 'click', 'click'], | |
'Event' : ['one', 'one', 'two', 'three', | |
'two', 'two', 'one', 'three'], | |
'Statistic 1' : np.random.randn(8), | |
'Statistic 2' : np.random.randn(8)}).sort_values("Type") |
The power of groupby() and lambda
Let’s say that for each Type in our dataframe, we want to create a sequenced list of Events. Nothing easier:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_grouped = df.groupby("Type") | |
df_list_events = df_grouped['Event'].apply(lambda x: list(x)).to_frame().reset_index() | |
df_list_events |
The power of itertools and lambda
And now let’s say we’re satisfied with our lists, but we’d like to de-dupe where adjacent events occur (e.g. “two, two” above should be reduced to “two)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_list_events["Deduped_Events"] = df_list_events["Event"].apply(lambda x: [k for k, g in itertools.groupby(x)]) | |
df_list_events |
And let’s not forget the magic of list comprehension
Let’s say we want to add a “U-” prefix to each of our Deduped_Events:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_list_events["Deduped_Events"] = df_list_events["Deduped_Events"].apply(lambda x: ["U-" + str(k) for k in x]) | |
df_list_events |