`EDAhelper.preprocess`

preprocess can be used to read data in different formats such as txt, json, csv and return the data as pandas.DataFrame. To use preprocess in a project:

# import function
from EDAhelper.EDAhelper import preprocess
import pandas as pd

Read csv data from buffer

file_path = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv'
df = preprocess(file_path)
df.iloc[:5, :5]

	PassengerId	Survived	Pclass	Name	Sex
0	1	0	3	Braund, Mr. Owen Harris	male
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female
2	3	1	3	Heikkinen, Miss. Laina	female
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female
4	5	0	3	Allen, Mr. William Henry	male

Read local data

file_path = '../tests/data_preprocess.csv'
preprocess(file_path)

	Unnamed: 0	col_1	col_2	col_3
0	0	NaN	1.0	a
1	1	1.0	2.0	b
2	2	1.0	NaN	c
3	3	3.0	5.0	d
4	4	0.0	NaN	NaN

Read data with different methods to dealing with missing values

preprocess(file_path, method='mean', index_col=0)

	col_1	col_2	col_3
0	1.25	1.000000	a
1	1.00	2.000000	b
2	1.00	2.666667	c
3	3.00	5.000000	d
4	0.00	2.666667	NaN

preprocess(file_path, method='median', index_col=0)

	col_1	col_2	col_3
0	1.0	1.0	a
1	1.0	2.0	b
2	1.0	2.0	c
3	3.0	5.0	d
4	0.0	2.0	NaN

Read data with extra pandas settings

preprocess(file_path, read_func=pd.read_csv, index_col=1)

	Unnamed: 0	col_2	col_3
col_1
NaN	0	1.0	a
1.0	1	2.0	b
1.0	2	NaN	c
3.0	3	5.0	d
0.0	4	NaN	NaN

EDAhelper.preprocess

`EDAhelper.preprocess`