Python Pandas Transform are used to Preform operations on groups and concatenate the results. Learn more about them in this guide.
Simple transform
First, Let’s create a dummy dataframe
We assume that a customer can have n orders, an order can have m items, and items can be ordered more multiple times
orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3] orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
And this is how the dataframe looks like: print(orders_df)
customer_id order_id item
0 1 1 apples
1 1 1 chocolate
2 1 1 chocolate
3 1 2 coffee
4 1 2 coffee
5 2 3 apples
6 2 3 bananas
7 3 4 coffee
8 3 5 milkshake
9 3 6 chocolate
10 3 6 strawberry
11 3 6 strawberry
.
.
Now, we will use pandas transform function to count the number of orders per customer
First, we define the function that will be applied per customer_id count_number_of_orders = lambda x: len(x.unique())
And now, we can transform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that
is called 'number_of_orders_per_cient'
orders_df # Take the original dataframe
.groupby(['customer_id'])['order_id'] # Create a separate group for each
customer_id & select the order_id
.transform(count_number_of_orders)) # Apply the function to each group
separately
Inspecting the results …
print(orders_df)
customer_id order_id item number_of_orders_per_cient
0 1 1 apples 2
1 1 1 chocolate 2
2 1 1 chocolate 2
3 1 2 coffee 2
4 1 2 coffee 2
5 2 3 apples 1
6 2 3 bananas 1
7 3 4 coffee 3
8 3 5 milkshake 3
9 3 6 chocolate 3
10 3 6 strawberry 3
11 3 6 strawberry 3
Python Pandas Transform: Multiple results per group
Using transform functions that return sub-calculations per group
In the previous example, we had one result per client. However, functions returning different values for the group can also be applied.
Create a dummy dataframe orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3] orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
Let's try to see if the items were ordered more than once in each orders
First, we define a function that will be applied per group
def multiple_items_per_order(_items):
Apply .duplicated, which will return True is the item occurs more than once. multiple_item_bool = _items.duplicated(keep=False) return(multiple_item_bool)
Then, we transform each group according to the defined function
orders_df['item_duplicated_per_order'] = ( # Put the results into a new column
orders_df # Take the orders dataframe
.groupby(['order_id'])['item'] # Create a separate group for each
order_id & select the item
.transform(multiple_items_per_order)) # Apply the defined function to each
group separately
Inspecting the results …
print(orders_df)