Avoiding for loops in Pandas

Screenshot 2018-12-30 at 14.05.23.png

There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results – and the lesson I keep learning is Don’t do it!

Every time I’m tempted to write a for loop with Pandas data I find myself clock watching and cursing… 9 times out of 10 there is another way, and here are 2 of my favourite recipes, enjoy :).

Some sample data


df = pd.DataFrame({'Type' : ['click', 'buy', 'click', 'buy',
'click', 'buy', 'click', 'click'],
'Event' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'Statistic 1' : np.random.randn(8),
'Statistic 2' : np.random.randn(8)}).sort_values("Type")

view raw

gistfile1.txt

hosted with ❤ by GitHub

Screenshot 2018-12-30 at 14.28.13

 

The power of groupby() and lambda

Let’s say that for each Type in our dataframe, we want to create a sequenced list of Events. Nothing easier:


df_grouped = df.groupby("Type")
df_list_events = df_grouped['Event'].apply(lambda x: list(x)).to_frame().reset_index()
df_list_events

Screenshot 2018-12-30 at 14.40.28

The power of itertools and lambda

And now let’s say we’re satisfied with our lists, but we’d like to de-dupe where adjacent events occur (e.g. “two, two” above should be reduced to “two)


df_list_events["Deduped_Events"] = df_list_events["Event"].apply(lambda x: [k for k, g in itertools.groupby(x)])
df_list_events

Screenshot 2018-12-30 at 14.51.32

And let’s not forget the magic of list comprehension

Let’s say we want to add a “U-” prefix to each of our Deduped_Events:


df_list_events["Deduped_Events"] = df_list_events["Deduped_Events"].apply(lambda x: ["U-" + str(k) for k in x])
df_list_events

Screenshot 2018-12-30 at 14.59.48

Comments are closed.

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: