Python Pandas Transform: Preform operations on groups and concatenate the results

Python Pandas Transform are used to Preform operations on groups and concatenate the results. Learn more about them in this guide.

Simple transform

First, Let’s create a dummy dataframe

We assume that a customer can have n orders, an order can have m items, and items can be ordered more multiple times

orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3] orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
And this is how the dataframe looks like: print(orders_df)

customer_id order_id item

0 1 1 apples

1 1 1 chocolate

2 1 1 chocolate

3 1 2 coffee

4 1 2 coffee

5 2 3 apples

6 2 3 bananas

7 3 4 coffee

8 3 5 milkshake

9 3 6 chocolate

10 3 6 strawberry

11 3 6 strawberry

.

.

Now, we will use pandas transform function to count the number of orders per customer

First, we define the function that will be applied per customer_id count_number_of_orders = lambda x: len(x.unique())
And now, we can transform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that
is called 'number_of_orders_per_cient'
orders_df # Take the original dataframe
.groupby(['customer_id'])['order_id'] # Create a separate group for each
customer_id & select the order_id
.transform(count_number_of_orders)) # Apply the function to each group
separately
Inspecting the results …
print(orders_df)

customer_id order_id item number_of_orders_per_cient

0 1 1 apples 2

1 1 1 chocolate 2

2 1 1 chocolate 2

3 1 2 coffee 2

4 1 2 coffee 2

5 2 3 apples 1

6 2 3 bananas 1

7 3 4 coffee 3

8 3 5 milkshake 3

9 3 6 chocolate 3

10 3 6 strawberry 3

11 3 6 strawberry 3

Python Pandas Transform: Multiple results per group

Using transform functions that return sub-calculations per group

In the previous example, we had one result per client. However, functions returning different values for the group can also be applied.

Create a dummy dataframe orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3] orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
Let's try to see if the items were ordered more than once in each orders
First, we define a function that will be applied per group
def multiple_items_per_order(_items):
Apply .duplicated, which will return True is the item occurs more than once. multiple_item_bool = _items.duplicated(keep=False) return(multiple_item_bool)
Then, we transform each group according to the defined function
orders_df['item_duplicated_per_order'] = ( # Put the results into a new column
orders_df # Take the orders dataframe
.groupby(['order_id'])['item'] # Create a separate group for each
order_id & select the item
.transform(multiple_items_per_order)) # Apply the defined function to each
group separately
Inspecting the results …
print(orders_df)

customer_id order_id item item_duplicated_per_order

0 1 1 apples False

1 1 1 chocolate True

2 1 1 chocolate True

3 1 2 coffee True

4 1 2 coffee True

5 2 3 apples False

6 2 3 bananas False

7 3 4 coffee False

8 3 5 milkshake False

9 3 6 chocolate False

10 3 6 strawberry True

11 3 6 strawberry True

Learn More

Leave a Comment