Q&A 13 How do you create a new variable in Python and R?
13.1 Explanation
Creating new variables — also called feature engineering — allows you to extract more insight from your data. A new variable can be based on:
- Arithmetic between columns
- Logical comparisons
- Conditional rules
In this example, we’ll create a new column called petal_ratio
, calculated as:
petal_length / petal_width
This ratio can help distinguish species based on shape.
13.2 Python Code
import pandas as pd
# Load dataset
df = pd.read_csv("data/iris.csv")
# Create a new column
df["petal_ratio"] = df["petal_length"] / df["petal_width"]
# Preview result
print(df[["petal_length", "petal_width", "petal_ratio"]].head())
petal_length petal_width petal_ratio
0 1.4 0.2 7.0
1 1.4 0.2 7.0
2 1.3 0.2 6.5
3 1.5 0.2 7.5
4 1.4 0.2 7.0
13.3 R Code
library(readr)
library(dplyr)
# Load dataset
df <- read_csv("data/iris.csv")
# Create a new column
df <- df %>%
mutate(petal_ratio = petal_length / petal_width)
# Preview result
df %>%
select(petal_length, petal_width, petal_ratio) %>%
head()
# A tibble: 6 × 3
petal_length petal_width petal_ratio
<dbl> <dbl> <dbl>
1 1.4 0.2 7
2 1.4 0.2 7
3 1.3 0.2 6.5
4 1.5 0.2 7.5
5 1.4 0.2 7
6 1.7 0.4 4.25
✅ Creating new variables gives you more ways to explore and model your data — it’s a key step in both EDA and machine learning pipelines.