Q&A 28 How do you use violin plots to compare groups across a categorical variable?

28.1 Explanation

Violin plots combine the summary statistics of boxplots with the distribution shape provided by kernel density estimation. They show:

  • The median and interquartile range (like a boxplot)
  • A smooth curve representing the data distribution
  • Symmetry, skewness, and modality within each group

Violin plots are especially useful when: - You want to see if distributions are symmetric or skewed - Comparing more than two groups - Spotting multi-modal patterns (multiple peaks)

28.2 Python Code

# ✅ Load libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("data/iris.csv")

# Violin plot: Sepal length by species (future-proof)
plt.figure(figsize=(6, 5))
sns.violinplot(
    data=df,
    x="species",
    y="sepal_length",
    hue="species",               # <-- assign hue to avoid the FutureWarning
    palette="viridis",
    legend=False                 # <-- hide duplicate legend
)
plt.title("Violin Plot: Sepal Length by Species")
plt.xlabel("Species")
plt.ylabel("Sepal Length")
plt.tight_layout()
plt.show()

28.3 R Code

# ✅ Load libraries
library(tidyverse)

# Load dataset
df <- read_csv("data/iris.csv", show_col_types = FALSE)

# Violin plot: Sepal length by species
ggplot(df, aes(x = species, y = sepal_length, fill = species)) +
  geom_violin(trim = FALSE) +
  labs(title = "Violin Plot: Sepal Length by Species", x = "Species", y = "Sepal Length") +
  theme_minimal()