Q&A 9 How do you check for missing values in Python and R?
9.1 Explanation
Before performing any analysis or modeling, it’s important to check for missing values. These can cause errors, affect summary statistics, or bias machine learning models if not handled properly.
In both Python and R, missing values are represented differently:
- In Python, missing values typically appear as
NaN
(Not a Number) - In R, they are represented as
NA
We’ll use built-in functions to:
- Detect if any values are missing
- Count missing values by column
- (Optionally) summarize total missing values across the dataset
9.2 Python Code
import pandas as pd
# Load the standardized dataset
df = pd.read_csv("data/iris.csv")
# Check if there are any missing values
print("Any missing values?", df.isnull().values.any())
# Count missing values by column
print("\nMissing values per column:")
print(df.isnull().sum())
# Optional: total number of missing entries
print("\nTotal missing values:", df.isnull().sum().sum())
Any missing values? False
Missing values per column:
sepal_length 0
sepal_width 0
petal_length 0
petal_width 0
species 0
dtype: int64
Total missing values: 0
9.3 R Code
library(readr)
# Load the standardized dataset
df <- read_csv("data/iris.csv")
# Check if any missing values exist
any(is.na(df))
[1] FALSE
sepal_length sepal_width petal_length petal_width species
0 0 0 0 0
[1] 0
✅ Once you’ve identified missing values, the next step is to decide how to handle them — such as removing, imputing, or flagging them.