EDAhelper.column_stats
Obtain summary statistics of column(s) including count, mean, median, mode, Q1, Q3, variance, standard deviation, correlation, and covariance in table format.
# import function
from EDAhelper.column_stats import column_stats
import pandas as pd
import numpy as np
import statistics
# load data
data = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/iris.data")
columns = ['SepalLength', 'SepalWidth']
data.head()
| SepalLength | SepalWidth | PetalLength | PetalWidth | Name | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Generate summary table, correlation matrix, and covariance matrix
column_stats takes two arguments; a data set, and a list of column names as strings
column_stats(data, columns)
( Column Count Mean Median Mode Q1 Q3 Var Stdev
0 SepalLength 150.0 5.843 5.8 5.0 5.1 6.4 0.686 0.828
1 SepalWidth 150.0 3.054 3.0 3.0 2.8 3.3 0.188 0.434,
SepalLength SepalWidth
SepalLength 1.000000 -0.109369
SepalWidth -0.109369 1.000000,
SepalLength SepalWidth
SepalLength 0.685694 -0.039268
SepalWidth -0.039268 0.188004)