Friday, June 5, 2020

Pandas: multiindex

pandas_multiindexes
In [39]:
import pandas as pd
df = pd.read_csv("W1.csv")
df
Out[39]:
customerID OrderID Product Company_Name
0 1234.0 9991.0 laptop Dell
1 2345.0 9992.0 desktop HP
2 3456.0 9993.0 monitor LG
3 NaN 8888.0 trackpad Apple
4 4567.0 9994.0 keyboard Anker
5 123.0 NaN NaN Apple
6 1234.0 9990.0 tablet Apple
In [10]:
df.set_index(keys=["OrderID","customerID"]).sort_index()
Out[10]:
Product Company_Name
OrderID customerID
8888.0 NaN trackpad Apple
9990.0 1234.0 tablet Apple
9991.0 1234.0 laptop Dell
9992.0 2345.0 desktop HP
9993.0 3456.0 monitor LG
9994.0 4567.0 keyboard Anker
NaN 123.0 NaN Apple

Works a little like groupby()

In [29]:
x = df.groupby(["OrderID","customerID"])
x.size()
Out[29]:
OrderID  customerID
9990.0   1234.0        1
9991.0   1234.0        1
9992.0   2345.0        1
9993.0   3456.0        1
9994.0   4567.0        1
dtype: int64
In [12]:
df.set_index(keys=["OrderID","customerID"]).index
Out[12]:
MultiIndex([(9991.0, 1234.0),
            (9992.0, 2345.0),
            (9993.0, 3456.0),
            (8888.0,    nan),
            (9994.0, 4567.0),
            (   nan,  123.0),
            (9990.0, 1234.0)],
           names=['OrderID', 'customerID'])
In [17]:
df.set_index(keys=["OrderID","customerID"]).index.names
Out[17]:
FrozenList(['OrderID', 'customerID'])

index_col on dataframe level is the same as .set_index()

In [32]:
import pandas as pd
df = pd.read_csv("W1.csv", index_col=("OrderID","customerID"))
df
Out[32]:
Product Company_Name
OrderID customerID
9991.0 1234.0 laptop Dell
9992.0 2345.0 desktop HP
9993.0 3456.0 monitor LG
8888.0 NaN trackpad Apple
9994.0 4567.0 keyboard Anker
NaN 123.0 NaN Apple
9990.0 1234.0 tablet Apple
In [ ]:
 

No comments:

Post a Comment

Pandas: SQL Like pandas operations

Pandas's SQL Like operations such as WHERE clause. = != >= str.contains() & | .isin() .isnull() .notnull() ....