import pandas as pd
df = pd.read_csv("W1.csv")
df

df.set_index(keys=["OrderID","customerID"]).sort_index()

Works a little like groupby()¶

x = df.groupby(["OrderID","customerID"])
x.size()

OrderID  customerID
9990.0   1234.0        1
9991.0   1234.0        1
9992.0   2345.0        1
9993.0   3456.0        1
9994.0   4567.0        1
dtype: int64

df.set_index(keys=["OrderID","customerID"]).index

MultiIndex([(9991.0, 1234.0),
            (9992.0, 2345.0),
            (9993.0, 3456.0),
            (8888.0,    nan),
            (9994.0, 4567.0),
            (   nan,  123.0),
            (9990.0, 1234.0)],
           names=['OrderID', 'customerID'])

df.set_index(keys=["OrderID","customerID"]).index.names

FrozenList(['OrderID', 'customerID'])

index_col on dataframe level is the same as .set_index()¶

import pandas as pd
df = pd.read_csv("W1.csv", index_col=("OrderID","customerID"))
df

	customerID	OrderID	Product	Company_Name
0	1234.0	9991.0	laptop	Dell
1	2345.0	9992.0	desktop	HP
2	3456.0	9993.0	monitor	LG
3	NaN	8888.0	trackpad	Apple
4	4567.0	9994.0	keyboard	Anker
5	123.0	NaN	NaN	Apple
6	1234.0	9990.0	tablet	Apple

		Product	Company_Name
OrderID	customerID
8888.0	NaN	trackpad	Apple
9990.0	1234.0	tablet	Apple
9991.0	1234.0	laptop	Dell
9992.0	2345.0	desktop	HP
9993.0	3456.0	monitor	LG
9994.0	4567.0	keyboard	Anker
NaN	123.0	NaN	Apple

		Product	Company_Name
OrderID	customerID
9991.0	1234.0	laptop	Dell
9992.0	2345.0	desktop	HP
9993.0	3456.0	monitor	LG
8888.0	NaN	trackpad	Apple
9994.0	4567.0	keyboard	Anker
NaN	123.0	NaN	Apple
9990.0	1234.0	tablet	Apple

Data Engineering

Friday, June 5, 2020

Pandas: multiindex

Works a little like groupby()¶

index_col on dataframe level is the same as .set_index()¶

No comments:

Post a Comment

Pandas: SQL Like pandas operations

Report Abuse