Q&A 10 How do you get summary statistics for numeric variables in Python and R?

10.1 Explanation

Summary statistics provide a quick overview of your numeric data. They help you understand:

  • Central tendency (mean, median)
  • Spread (min, max, standard deviation, quartiles)
  • Distribution shape and potential outliers

Both Python and R offer built-in functions to calculate summary statistics for each column in a dataset. These are essential when assessing data quality and preparing for visualization or modeling.


10.2 Python Code

import pandas as pd

# Load the dataset
df = pd.read_csv("data/iris.csv")

# Get summary statistics for all numeric columns
summary = df.describe()
print(summary)
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

💡 df.describe() returns count, mean, std, min, 25%, 50% (median), 75%, and max for each numeric column.

10.3 R Code

library(readr)

# Load the dataset
df <- read_csv("data/iris.csv")

# Get summary statistics
summary(df)
  sepal_length    sepal_width     petal_length    petal_width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
   species         
 Length:150        
 Class :character  
 Mode  :character  
                   
                   
                   

💡 summary() in R returns min, 1st quartile, median, mean, 3rd quartile, and max.


✅ These summaries give you a solid first look at the data distribution and can guide further steps like filtering, normalization, or visualization.