EDAhelper.column_stats

Obtain summary statistics of column(s) including count, mean, median, mode, Q1, Q3, variance, standard deviation, correlation, and covariance in table format.

# import function
from EDAhelper.column_stats import column_stats
import pandas as pd
import numpy as np
import statistics
# load data
data = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/iris.data")
columns = ['SepalLength', 'SepalWidth']

data.head()
SepalLength SepalWidth PetalLength PetalWidth Name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

Generate summary table, correlation matrix, and covariance matrix

column_stats takes two arguments; a data set, and a list of column names as strings

column_stats(data, columns)
(        Column  Count   Mean  Median  Mode   Q1   Q3    Var  Stdev
 0  SepalLength  150.0  5.843     5.8   5.0  5.1  6.4  0.686  0.828
 1   SepalWidth  150.0  3.054     3.0   3.0  2.8  3.3  0.188  0.434,
              SepalLength  SepalWidth
 SepalLength     1.000000   -0.109369
 SepalWidth     -0.109369    1.000000,
              SepalLength  SepalWidth
 SepalLength     0.685694   -0.039268
 SepalWidth     -0.039268    0.188004)