Sho't left to data science

Avoiding for loops in Pandas

Dec 30, 2018

—

Screenshot 2018-12-30 at 14.05.23.png

There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results – and the lesson I keep learning is Don’t do it!

Every time I’m tempted to write a for loop with Pandas data I find myself clock watching and cursing… 9 times out of 10 there is another way, and here are 2 of my favourite recipes, enjoy :).

Some sample data

	df = pd.DataFrame({'Type' : ['click', 'buy', 'click', 'buy',
	'click', 'buy', 'click', 'click'],
	'Event' : ['one', 'one', 'two', 'three',
	'two', 'two', 'one', 'three'],
	'Statistic 1' : np.random.randn(8),
	'Statistic 2' : np.random.randn(8)}).sort_values("Type")

view raw

gistfile1.txt

hosted with ❤ by GitHub

Screenshot 2018-12-30 at 14.28.13

The power of groupby() and lambda

Let’s say that for each Type in our dataframe, we want to create a sequenced list of Events. Nothing easier:

	df_grouped = df.groupby("Type")
	df_list_events = df_grouped['Event'].apply(lambda x: list(x)).to_frame().reset_index()
	df_list_events

view raw

Avoid_for_loops_2.txt

hosted with ❤ by GitHub

Screenshot 2018-12-30 at 14.40.28

The power of itertools and lambda

And now let’s say we’re satisfied with our lists, but we’d like to de-dupe where adjacent events occur (e.g. “two, two” above should be reduced to “two)

	df_list_events["Deduped_Events"] = df_list_events["Event"].apply(lambda x: [k for k, g in itertools.groupby(x)])
	df_list_events

view raw

Avoid_for_loops_3

hosted with ❤ by GitHub

Screenshot 2018-12-30 at 14.51.32

And let’s not forget the magic of list comprehension

Let’s say we want to add a “U-” prefix to each of our Deduped_Events:

	df_list_events["Deduped_Events"] = df_list_events["Deduped_Events"].apply(lambda x: ["U-" + str(k) for k in x])
	df_list_events

view raw

Avoid_for_loops_4

hosted with ❤ by GitHub

Screenshot 2018-12-30 at 14.59.48

how to pandas