astype()¶

info()¶

nunique()¶

value_sort()¶

rank()¶

Convert data type in with astype() method¶

import pandas as pd
df = pd.read_csv("test5.csv")
df

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   number   8 non-null      float64
 1   name     8 non-null      object 
 2   species  8 non-null      object 
dtypes: float64(1), object(2)
memory usage: 320.0+ bytes

import pandas as pd
df = pd.read_csv("test5.csv")
df["number"].astype("int")
#no inplace parameter, so assign it back to itself
#df = df["number"].astype("int")
#df.info()
#After converted, it is becoming a Series.
df["number"] = df["number"].astype("int")
df.info()
df[["name","number"]]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   number   8 non-null      int64 
 1   name     8 non-null      object
 2   species  8 non-null      object
dtypes: int64(1), object(2)
memory usage: 320.0+ bytes

import pandas as pd
df = pd.read_csv("test5.csv")
df["number"].astype("int")
#no inplace parameter, so assign it back to itself
df = df["number"].astype("int")
#df.info()
#After converted, it is becoming a Series.
#watch out this is reassigning back to entire dataframe
df

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
Name: number, dtype: int64

using nunique() method like select count(distinct COLUMN ) in sql¶

import pandas as pd
df = pd.read_csv("test5.csv")
#df["species"].nunique()
df["species"].nunique()

4

unique() to list the category ... similar to the following of Categorical at the pd level. at df level uses nunique() (n for number) and unique() methods¶

import pandas as pd
df = pd.read_csv("test5.csv")
#df["species"].nunique()
df["species"].unique()

array(['human', 'cat', 'dog', 'gorilla'], dtype=object)

using Categorical to list out the category like select distinct() in sql¶

import pandas as pd
df = pd.read_csv("test5.csv")
pd.Categorical(df["species"].unique())

[human, cat, dog, gorilla]
Categories (4, object): [cat, dog, gorilla, human]

usign .value_sort() like ORDER BY in sql¶

need at least one parameter which is the column. This can be multi-columns sort. It supports inplace.¶

import pandas as pd
df = pd.read_csv("test5.csv")
df.sort_values(by = "name", ascending ="False")

#multicolumns sort with ascending true or false
df.sort_values(by=["name", "species"], ascending=[False,True])

using .rank() like TopN Analysis in RANK() in SQL¶

import pandas as pd
df = pd.read_csv('test5.csv')
#rank all columns individually from the dataframe. Interesting that none numerical can be ranked as well
df.rank()

#rank individual column provided by the dataframe
df["number"].rank(ascending=False)

0    8.0
1    7.0
2    6.0
3    5.0
4    4.0
5    3.0
6    2.0
7    1.0
Name: number, dtype: float64

#this works like TopN Analysis in sql where only 3 top to be queried, inplaced it into original column
df["number"] = df["number"].rank(ascending=True).head(3)
#drop out the ones that not selected from presenting to the output
df.dropna()

	name	number
0	joe	1
1	john	2
2	mike	3
3	didi	4
4	aaron	5
5	boo	6
6	ziggy	7
7	balou	8

Data Engineering

Tuesday, May 5, 2020

Pandas: Common 1

astype()¶

info()¶

nunique()¶

nunique()¶

value_sort()¶

rank()¶

Convert data type in with astype() method¶

using nunique() method like select count(distinct COLUMN ) in sql¶

unique() to list the category ... similar to the following of Categorical at the pd level. at df level uses nunique() (n for number) and unique() methods¶

using Categorical to list out the category like select distinct() in sql¶

usign .value_sort() like ORDER BY in sql¶

need at least one parameter which is the column. This can be multi-columns sort. It supports inplace.¶

using .rank() like TopN Analysis in RANK() in SQL¶

No comments:

Post a Comment

Pandas: SQL Like pandas operations

Report Abuse

	number	name	species
0	1.0	joe	human
1	2.0	john	human
2	3.0	mike	human
3	4.0	didi	cat
4	5.0	aaron	human
5	6.0	boo	dog
6	7.0	ziggy	dog
7	8.0	balou	gorilla

	number	name	species
0	1.0	5.0	6.5
1	2.0	6.0	6.5
2	3.0	7.0	6.5
3	4.0	4.0	1.0
4	5.0	1.0	6.5
5	6.0	3.0	2.5
6	7.0	8.0	2.5
7	8.0	2.0	4.0

	number	name	species
0	1.0	5.0	6.5
1	2.0	6.0	6.5
2	3.0	7.0	6.5
3	4.0	4.0	1.0
4	5.0	1.0	6.5
5	6.0	3.0	2.5
6	7.0	8.0	2.5
7	8.0	2.0	4.0

	number	name	species
0	1.0	5.0	6.5
1	2.0	6.0	6.5
2	3.0	7.0	6.5
3	4.0	4.0	1.0
4	5.0	1.0	6.5
5	6.0	3.0	2.5
6	7.0	8.0	2.5
7	8.0	2.0	4.0