Q&A 20 How do you convert variable types in a dataset?
20.1 Explanation
In earlier steps, we created a small test dataset to explore variable types. Now, let’s switch to a more realistic dataset: data/iris.csv
, which was loaded and inspected in the Exploratory Data Analysis (EDA) section.
This dataset contains numerical features (sepal/petal dimensions) and a categorical target (species
). We’ll demonstrate how to:
- Convert the
species
column to a categorical or factor type - Confirm numeric columns are correctly typed
- Prepare variables for modeling and visualization
20.2 Python Code
# ✅ Import libraries
import pandas as pd
# Load the dataset
df = pd.read_csv("data/iris.csv")
# Convert species to a categorical variable
df["species"] = df["species"].astype("category")
# Confirm types
print("\nVariable/Feauture type\n",df.dtypes)
print("\nSpecies type\n",df["species"].cat.categories)
Variable/Feauture type
sepal_length float64
sepal_width float64
petal_length float64
petal_width float64
species category
dtype: object
Species type
Index(['setosa', 'versicolor', 'virginica'], dtype='object')
20.3 R Code
# ✅ Load modern tools
library(tidyverse)
# Load iris dataset
df <- read_csv("data/iris.csv", show_col_types = FALSE)
# Convert species to factor
df <- df %>%
mutate(species = as.factor(species))
# Inspect types
glimpse(df)
Rows: 150
Columns: 5
$ sepal_length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ sepal_width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ petal_length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ petal_width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
[1] "setosa" "versicolor" "virginica"