Data Structures

*class* pandas. **DataFrame** (*data=None*,*index=None*, *columns=None*, *dtype=None*, *copy=False*)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters

data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.
index : Index or array-like

Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype : dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer
copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

Note: All the manipulation operation creates a new dataframe and doesn't change the original dataframe, so either instantiate the dataframe back to the same variable, or explicitely pass inplace=True, if available.

Examples

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
col1  col2
0     1     3
1     2     4

df = pandas.read_csv("http://pythonhow.com/supermarkets.csv")
df = pandas.read_json('supermarkets.json')
df.set_index('ID')
df.shape # returns a tuple with (num_of_rows, num_of_columns)

loan_data_backup = df.copy()
df.columns.values
df.info()
df_columns['col_name'].unique()

# Extract row, column from the dataframe
df.loc[:,"Country"])
df.iloc[3,1:4]
df.ix[3,4]

# Delete row, column from the dataframe
df.drop("332 Hill St", 0)
df.drop(df.columns[0:3],1)
df.columns # returns list of all the column of the dataframe

# Add row column into the dataframe
df["Continent"] = df.shape[0]*[North America"] # creates a new column Continent and set all the values of the rows to "North America"

df["Continent"] = df["Country"] + "," + "North America"
df["Address"] = df["Address"] + ", " + df["City"] + ", " + df["State"] + ", " + df["Country"] #update the column Continent, with all addition of all the values specified in the Column.

df.T # Transpose of the dataframe

Parameters​

Examples​

Parameters

Examples