Q&A 11 How do you filter rows based on a condition in Python and R?

11.1 Explanation

Filtering is one of the most important skills in data wrangling. It allows you to isolate subsets of data that meet certain conditions — for example:

  • Observations above or below a threshold
  • Specific categories (e.g., only one species)
  • Logical combinations (e.g., long petals and wide sepals)

In both Python and R, filtering uses logical expressions that evaluate to True or False for each row.

In this example, we’ll filter rows where:

sepal_length > 5.0


11.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Filter rows where sepal_length > 5.0
filtered_df = df[df["sepal_length"] > 5.0]

# View result
print(filtered_df.head())

# Confirm number of rows
print("Filtered rows:", filtered_df.shape[0])
    sepal_length  sepal_width  petal_length  petal_width species
0            5.1          3.5           1.4          0.2  setosa
5            5.4          3.9           1.7          0.4  setosa
10           5.4          3.7           1.5          0.2  setosa
14           5.8          4.0           1.2          0.2  setosa
15           5.7          4.4           1.5          0.4  setosa
Filtered rows: 118

11.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Filter rows where sepal_length > 5.0
filtered_df <- df %>%
  filter(sepal_length > 5.0)

# View result
head(filtered_df)
# A tibble: 6 × 5
  sepal_length sepal_width petal_length petal_width species
         <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1          5.1         3.5          1.4         0.2 setosa 
2          5.4         3.9          1.7         0.4 setosa 
3          5.4         3.7          1.5         0.2 setosa 
4          5.8         4            1.2         0.2 setosa 
5          5.7         4.4          1.5         0.4 setosa 
6          5.4         3.9          1.3         0.4 setosa 
# Confirm number of rows
nrow(filtered_df)
[1] 118

✅ Filtering is the gateway to conditional analysis — you can combine multiple conditions and pipe the result into visualizations or summaries.