Default and original cvs¶

import pandas as pd
df = pd.read_csv("test.csv")
df.style

Using only 1 column . in this case email column¶

import pandas as pd
df = pd.read_csv("test.csv", usecols = ["email"])
df

Selecting specific columns¶

Note 1: 2 ways to select specific columns df.column and df["column"]. The first alternative would have trouble selecting columns with space in between column name.

Note 2: also watch out for the first 2 outputs, they are Series. The output format look different instead of tabular format. The third output is an list/array hence with the 2 square brackets. One can assign the list of array into a variable.

import pandas as pd

df = pd.read_csv("test.csv")
display(df.name)
display(df["email"])
display(df[["email","name"]])

0        Adam, Bobo
1         John, Doe
2     Mary, Johnson
3       Zikie, Bath
4     Beth, Matties
5    Dale, Carnegie
6               NaN
7        Vlad, Didi
8               NaN
9      Mike, Typoon
Name: name, dtype: object

0            adam@it.com
1           johnd@it.com
2           maryh@it.com
3    zikieb@software.com
4    bethmat@company.com
5         DaleC@dale.com
6                    NaN
7           vladd@ru.com
8     noname@emailme.com
9       miket@runner.com
Name: email, dtype: object

Using custom column as index .. in this case using name¶

import pandas as pd
df = pd.read_csv("test.csv", index_col="name")
df

Hide away an Index .. key word is hide, this is different than not using the index.¶

import pandas as pd
df = pd.read_csv("test.csv")
df.style.hide_index()

Useful count method. Works like select columnA, count() from table having count() > 1 group by columnA.¶

import pandas as pd
df = pd.read_csv("test.csv")
df["name"].value_counts()

Dale, Carnegie    1
Mary, Johnson     1
Vlad, Didi        1
Beth, Matties     1
Mike, Typoon      1
Adam, Bobo        1
Zikie, Bath       1
John, Doe         1
Name: name, dtype: int64

** notice the different between .value_counts() and count(). .value_counts() will be like group count, count is counting how many rows like "select count() from table"

df["name"].count()

8

*** counting how many values excluding NULL/NaN within the rows. For example, Index 0 has 2 values, Index 6 has none, since it has both columsn of NULLS

df.count(axis="columns")

0    2
1    2
2    2
3    2
4    2
5    2
6    0
7    2
8    1
9    2
dtype: int64

Dealing with NULL rows. dropna() will drop any rows with NULL¶

import pandas as pd
df = pd.read_csv("test.csv")
# how = all, means how you want panda to remove. All means, you want entire row removed only if only ALL of them NULL
# the default is how=any. Means, if any of the records is NULL, entire row will be removed
# inplace = True, this will overwirte the new result sets to the dataframe
# keyword all, any and inplace
df.dropna(how="all")

#drop the entire rows if any of the row have nulls. dropna() is default and same as dropna(how=any)
#df.dropna()
df.dropna(how="any")

#drop entire column as long as there is a null within. Here it dropped both columns
print(df.dropna(axis = 1))

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#drop any columns as long as there are nulls within the columms. By default, the dropna drop the rows that have nulls
#this is the same as axis=1
df.dropna(axis = "columns")

#drop a row ONLY if that column have nulls
df.dropna(subset=["name"])

fillna() to fill a NULL value into some other value¶

#filling null cells with something
df.fillna(value="NULLIFIED")

# pick only a column or multiple columns to fillna. Just list out the column List
df[["name"]].fillna(value="FILL ME")

Colorize the null cells¶

import pandas as pd
df = pd.read_csv("test.csv")
#df.style
df.style.format(None, na_rep="NULL", subset=["name","email"]).highlight_null("orange")

Data Engineering

Tuesday, May 5, 2020

Pandas: Common 1

Default and original cvs¶

Using only 1 column . in this case email column¶

Selecting specific columns¶

Using custom column as index .. in this case using name¶

Hide away an Index .. key word is hide, this is different than not using the index.¶

Useful count method. Works like select columnA, count() from table having count() > 1 group by columnA.¶

Dealing with NULL rows. dropna() will drop any rows with NULL¶

fillna() to fill a NULL value into some other value¶

Colorize the null cells¶

No comments:

Post a Comment

Pandas: SQL Like pandas operations

Report Abuse

	name	email
0	Adam, Bobo	adam@it.com
1	John, Doe	johnd@it.com
2	Mary, Johnsom	maryh@it.com
3	Zikie, Bath	zikieb@software.com
4	Beth, Matties	bethmat@company.com
5	Dale, Carnegie	DaleC@dale.com
6		nan
7	Vlad, Didi	vladd@ru.com
8		noname@emailme.com
9	Mike, Typoon	miket@runner.com

	email
name
Adam, Bobo	adam@it.com
John, Doe	johnd@it.com
Mary, Johnson	maryh@it.com
Zikie, Bath	zikieb@software.com
Beth, Matties	bethmat@company.com
Dale, Carnegie	DaleC@dale.com
NaN	NaN
Vlad, Didi	vladd@ru.com
NaN	noname@emailme.com
Mike, Typoon	miket@runner.com