####
Default and original cvs¶
import pandas as pd
df = pd.read_csv("test.csv")
df.style
Using only 1 column . in this case email column¶
import pandas as pd
df = pd.read_csv("test.csv", usecols = ["email"])
df
Selecting specific columns¶
Note 1: 2 ways to select specific columns df.column and df["column"]. The first alternative would have trouble selecting columns with space in between column name.
Note 2: also watch out for the first 2 outputs, they are Series. The output format look different instead of tabular format. The third output is an list/array hence with the 2 square brackets. One can assign the list of array into a variable.
import pandas as pd
df = pd.read_csv("test.csv")
display(df.name)
display(df["email"])
display(df[["email","name"]])
Using custom column as index .. in this case using name¶
import pandas as pd
df = pd.read_csv("test.csv", index_col="name")
df
Hide away an Index .. key word is hide, this is different than not using the index.¶
import pandas as pd
df = pd.read_csv("test.csv")
df.style.hide_index()
Useful count method. Works like select columnA, count() from table having count() > 1 group by columnA.¶
import pandas as pd
df = pd.read_csv("test.csv")
df["name"].value_counts()
** notice the different between .value_counts() and count(). .value_counts() will be like group count, count is counting how many rows like "select count() from table"
df["name"].count()
*** counting how many values excluding NULL/NaN within the rows. For example, Index 0 has 2 values, Index 6 has none, since it has both columsn of NULLS
df.count(axis="columns")
Dealing with NULL rows. dropna() will drop any rows with NULL¶
import pandas as pd
df = pd.read_csv("test.csv")
# how = all, means how you want panda to remove. All means, you want entire row removed only if only ALL of them NULL
# the default is how=any. Means, if any of the records is NULL, entire row will be removed
# inplace = True, this will overwirte the new result sets to the dataframe
# keyword all, any and inplace
df.dropna(how="all")
#drop the entire rows if any of the row have nulls. dropna() is default and same as dropna(how=any)
#df.dropna()
df.dropna(how="any")
#drop entire column as long as there is a null within. Here it dropped both columns
print(df.dropna(axis = 1))
#drop any columns as long as there are nulls within the columms. By default, the dropna drop the rows that have nulls
#this is the same as axis=1
df.dropna(axis = "columns")
#drop a row ONLY if that column have nulls
df.dropna(subset=["name"])
fillna() to fill a NULL value into some other value¶
#filling null cells with something
df.fillna(value="NULLIFIED")
# pick only a column or multiple columns to fillna. Just list out the column List
df[["name"]].fillna(value="FILL ME")
Colorize the null cells¶
import pandas as pd
df = pd.read_csv("test.csv")
#df.style
df.style.format(None, na_rep="NULL", subset=["name","email"]).highlight_null("orange")
No comments:
Post a Comment