Friday, June 5, 2020

Pandas: str.split()

str.split.expand.true

How to use split expand=True to expand records in a cell to multi columns and limit it with n parameter.

In [5]:
import pandas as pd
df = pd.read_csv("str_split_expand_true.csv")
df
Out[5]:
name email
0 john,doe j jdj@dell.com
1 mary,ann d mad@apple.com
2 baku,raju b brj@microsoft.com
In [38]:
df["name"].str.split(",", expand=True)
Out[38]:
0 1
0 john doe j
1 mary ann d
2 baku raju b
assign it back to the dataframe
In [35]:
df[["First Name", "Last Name"]]  =  df["name"].str.split(",", expand=True)
df
Out[35]:
name email First Name Last Name
0 john,doe j jdj@dell.com john doe j
1 mary,ann d mad@apple.com mary ann d
2 baku,raju b brj@microsoft.com baku raju b

drop off redundant original column

In [34]:
df.drop("name" , axis="columns")
Out[34]:
email First Name Last Name
0 jdj@dell.com john doe j
1 mad@apple.com mary ann d
2 brj@microsoft.com baku raju b

a lot of things can be done after this point.. nunique, value_counts, catergory

In [60]:
df = pd.read_csv("str_split_expand_true.csv")
df[["fname","lname"]] = df["name"].str.split(",", expand=True)
df[["lname","minit"]] = df["lname"].str.split(" ", expand=True)
df["fname"].value_counts()
Out[60]:
baku    1
mary    1
john    1
Name: fname, dtype: int64
In [61]:
df["fname"].nunique()
Out[61]:
3
In [68]:
import pandas as pd
df = pd.read_csv("str_split_expand_true.csv")
df["name"].str.split(",").get(0)
Out[68]:
['john', 'doe j']
In [ ]:
 

No comments:

Post a Comment

Pandas: SQL Like pandas operations

Pandas's SQL Like operations such as WHERE clause. = != >= str.contains() & | .isin() .isnull() .notnull() ....