Q&A 38 How do you visualize correlation between numerical variables?
38.1 Explanation
A correlation heatmap provides a quick overview of linear relationships between numerical variables in a dataset. It helps:
- Identify strong correlations (positive or negative)
- Detect multicollinearity for modeling
- Choose appropriate features for further analysis
The correlation values range from –1 to +1: - +1: perfect positive correlation - –1: perfect negative correlation - 0: no linear relationship
Heatmaps display these values with color intensity and optional labels.
38.2 Python Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv("data/iris.csv")
# Compute correlation matrix (numeric columns only)
corr = df.select_dtypes(include="number").corr()
# Set figure size and plot
plt.figure(figsize=(6, 5))
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f", square=True,
linewidths=0.5, cbar_kws={"shrink": 0.8})
# Customize plot
plt.title("Correlation Heatmap")
plt.tight_layout()
plt.show()
38.3 R Code
library(readr)
library(ggcorrplot)
# Load dataset
df <- read_csv("data/iris.csv")
# Compute correlation matrix (numeric columns)
corr <- cor(df[1:4])
# Plot correlation heatmap
ggcorrplot(corr, method = "circle", lab = TRUE, lab_size = 3,
colors = c("red", "white", "blue"),
title = "Correlation Heatmap", ggtheme = ggplot2::theme_minimal())