#######
Convert data type in with astype() method¶
In [68]:
import pandas as pd
df = pd.read_csv("test5.csv")
df
Out[68]:
In [69]:
df.info()
In [70]:
import pandas as pd
df = pd.read_csv("test5.csv")
df["number"].astype("int")
#no inplace parameter, so assign it back to itself
#df = df["number"].astype("int")
#df.info()
#After converted, it is becoming a Series.
df["number"] = df["number"].astype("int")
df.info()
df[["name","number"]]
Out[70]:
In [71]:
import pandas as pd
df = pd.read_csv("test5.csv")
df["number"].astype("int")
#no inplace parameter, so assign it back to itself
df = df["number"].astype("int")
#df.info()
#After converted, it is becoming a Series.
#watch out this is reassigning back to entire dataframe
df
Out[71]:
In [ ]:
using nunique() method like select count(distinct COLUMN ) in sql¶
In [72]:
import pandas as pd
df = pd.read_csv("test5.csv")
#df["species"].nunique()
df["species"].nunique()
Out[72]:
unique() to list the category ... similar to the following of Categorical at the pd level. at df level uses nunique() (n for number) and unique() methods¶
In [73]:
import pandas as pd
df = pd.read_csv("test5.csv")
#df["species"].nunique()
df["species"].unique()
Out[73]:
using Categorical to list out the category like select distinct() in sql¶
In [74]:
import pandas as pd
df = pd.read_csv("test5.csv")
pd.Categorical(df["species"].unique())
Out[74]:
usign .value_sort() like ORDER BY in sql¶
need at least one parameter which is the column. This can be multi-columns sort. It supports inplace.¶
In [80]:
import pandas as pd
df = pd.read_csv("test5.csv")
df.sort_values(by = "name", ascending ="False")
Out[80]:
In [87]:
#multicolumns sort with ascending true or false
df.sort_values(by=["name", "species"], ascending=[False,True])
Out[87]:
using .rank() like TopN Analysis in RANK() in SQL¶
In [89]:
import pandas as pd
df = pd.read_csv('test5.csv')
#rank all columns individually from the dataframe. Interesting that none numerical can be ranked as well
df.rank()
Out[89]:
In [92]:
#rank individual column provided by the dataframe
df["number"].rank(ascending=False)
Out[92]:
In [99]:
#this works like TopN Analysis in sql where only 3 top to be queried, inplaced it into original column
df["number"] = df["number"].rank(ascending=True).head(3)
#drop out the ones that not selected from presenting to the output
df.dropna()
Out[99]:
In [ ]:
No comments:
Post a Comment