Wednesday, May 27, 2020

Pandas: Bad Column definition causing issue to query

The space in the column can cause issue to column selection even though the data frame output may look fine. bad_column_definition_name
In [16]:
import pandas as pd
df = pd.read_csv("bad_column_definition.csv")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   customerID  2 non-null      int64
 1    OrderID    2 non-null      int64
dtypes: int64(2)
memory usage: 160.0 bytes
In [17]:
customerID OrderID
0 1234 9999
1 2345 8888
In [18]:
0    1234
1    2345
Name: customerID, dtype: int64


Why it bombs out? The OrderID column has a space in front. pandas unable to determine the column.
In [19]:
KeyError                                  Traceback (most recent call last)
/Library/Python/3.7/site-packages/pandas/core/indexes/ in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'OrderID'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-0c1ed995e0d1> in <module>
----> 1 df["OrderID"]

/Library/Python/3.7/site-packages/pandas/core/ in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/Library/Python/3.7/site-packages/pandas/core/indexes/ in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'OrderID'
In [ ]:

