Skip to main content

Data Structures

*class* pandas. **DataFrame** (*data=None*,*index=None*, *columns=None*, *dtype=None*, *copy=False*)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters

  • data : numpy ndarray (structured or homogeneous), dict, or DataFrame

    Dict can contain Series, arrays, constants, or list-like objects

    Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.

  • index : Index or array-like

    Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided

  • columns : Index or array-like

    Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided

  • dtype : dtype, default None

    Data type to force. Only a single dtype is allowed. If None, infer

  • copy : boolean, default False

    Copy data from inputs. Only affects DataFrame / 2d ndarray input

Note: All the manipulation operation creates a new dataframe and doesn't change the original dataframe, so either instantiate the dataframe back to the same variable, or explicitely pass inplace=True, if available.

Examples

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
col1 col2
0 1 3
1 2 4

df = pandas.read_csv("http://pythonhow.com/supermarkets.csv")
df = pandas.read_json('supermarkets.json')
df.set_index('ID')
df.shape # returns a tuple with (num_of_rows, num_of_columns)

loan_data_backup = df.copy()
df.columns.values
df.info()
df_columns['col_name'].unique()

# Extract row, column from the dataframe
df.loc[:,"Country"])
df.iloc[3,1:4]
df.ix[3,4]

# Delete row, column from the dataframe
df.drop("332 Hill St", 0)
df.drop(df.columns[0:3],1)
df.columns # returns list of all the column of the dataframe

# Add row column into the dataframe
df["Continent"] = df.shape[0]*[North America"] # creates a new column Continent and set all the values of the rows to "North America"

df["Continent"] = df["Country"] + "," + "North America"
df["Address"] = df["Address"] + ", " + df["City"] + ", " + df["State"] + ", " + df["Country"] #update the column Continent, with all addition of all the values specified in the Column.

df.T # Transpose of the dataframe