Q&A 9 How do you check for missing values in Python and R?

9.1 Explanation

Before performing any analysis or modeling, it’s important to check for missing values. These can cause errors, affect summary statistics, or bias machine learning models if not handled properly.

In both Python and R, missing values are represented differently:

  • In Python, missing values typically appear as NaN (Not a Number)
  • In R, they are represented as NA

We’ll use built-in functions to:

  • Detect if any values are missing
  • Count missing values by column
  • (Optionally) summarize total missing values across the dataset

9.2 Python Code

import pandas as pd

# Load the standardized dataset
df = pd.read_csv("data/iris.csv")

# Check if there are any missing values
print("Any missing values?", df.isnull().values.any())

# Count missing values by column
print("\nMissing values per column:")
print(df.isnull().sum())

# Optional: total number of missing entries
print("\nTotal missing values:", df.isnull().sum().sum())

Any missing values? False

Missing values per column:
sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64

Total missing values: 0

9.3 R Code

library(readr)

# Load the standardized dataset
df <- read_csv("data/iris.csv")

# Check if any missing values exist
any(is.na(df))
[1] FALSE
# Count missing values per column
colSums(is.na(df))
sepal_length  sepal_width petal_length  petal_width      species 
           0            0            0            0            0 
# Optional: total number of missing values
sum(is.na(df))
[1] 0

✅ Once you’ve identified missing values, the next step is to decide how to handle them — such as removing, imputing, or flagging them.