Q&A 11 How do you filter rows based on a condition in Python and R?
11.1 Explanation
Filtering is one of the most important skills in data wrangling. It allows you to isolate subsets of data that meet certain conditions — for example:
- Observations above or below a threshold
- Specific categories (e.g., only one species)
- Logical combinations (e.g., long petals and wide sepals)
In both Python and R, filtering uses logical expressions that evaluate to True
or False
for each row.
In this example, we’ll filter rows where:
sepal_length > 5.0
11.2 Python Code
import pandas as pd
# Load dataset
df = pd.read_csv("data/iris.csv")
# Filter rows where sepal_length > 5.0
filtered_df = df[df["sepal_length"] > 5.0]
# View result
print(filtered_df.head())
# Confirm number of rows
print("Filtered rows:", filtered_df.shape[0])
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
5 5.4 3.9 1.7 0.4 setosa
10 5.4 3.7 1.5 0.2 setosa
14 5.8 4.0 1.2 0.2 setosa
15 5.7 4.4 1.5 0.4 setosa
Filtered rows: 118
11.3 R Code
library(readr)
library(dplyr)
# Load dataset
df <- read_csv("data/iris.csv")
# Filter rows where sepal_length > 5.0
filtered_df <- df %>%
filter(sepal_length > 5.0)
# View result
head(filtered_df)
# A tibble: 6 × 5
sepal_length sepal_width petal_length petal_width species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 5.4 3.9 1.7 0.4 setosa
3 5.4 3.7 1.5 0.2 setosa
4 5.8 4 1.2 0.2 setosa
5 5.7 4.4 1.5 0.4 setosa
6 5.4 3.9 1.3 0.4 setosa
[1] 118
✅ Filtering is the gateway to conditional analysis — you can combine multiple conditions and pipe the result into visualizations or summaries.