Q&A 13 How do you create a new variable in Python and R?

13.1 Explanation

Creating new variables — also called feature engineering — allows you to extract more insight from your data. A new variable can be based on:

Arithmetic between columns
Logical comparisons
Conditional rules

In this example, we’ll create a new column called petal_ratio, calculated as:

petal_length / petal_width

This ratio can help distinguish species based on shape.

13.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Create a new column
df["petal_ratio"] = df["petal_length"] / df["petal_width"]

# Preview result
print(df[["petal_length", "petal_width", "petal_ratio"]].head())

   petal_length  petal_width  petal_ratio
0           1.4          0.2          7.0
1           1.4          0.2          7.0
2           1.3          0.2          6.5
3           1.5          0.2          7.5
4           1.4          0.2          7.0

13.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Create a new column
df <- df %>%
  mutate(petal_ratio = petal_length / petal_width)

# Preview result
df %>%
  select(petal_length, petal_width, petal_ratio) %>%
  head()

# A tibble: 6 × 3
  petal_length petal_width petal_ratio
         <dbl>       <dbl>       <dbl>
1          1.4         0.2        7   
2          1.4         0.2        7   
3          1.3         0.2        6.5 
4          1.5         0.2        7.5 
5          1.4         0.2        7   
6          1.7         0.4        4.25

✅ Creating new variables gives you more ways to explore and model your data — it’s a key step in both EDA and machine learning pipelines.