Q&A 6 How do you load a pre-cleaned dataset in Python and R?

6.1 Explanation

Loading a dataset is one of the first steps in any data analysis project — especially when you’re working from a previously saved, cleaned version.

In this guide, we assume that the dataset *.csv has been saved in your data/ folder. We’ll now load it using:

Python: via the pandas library and its read_csv() function
R: using the readr package and its read_csv() function

6.2 Using consistent file paths (like `data/*.csv`) ensures reproducibility across environments.

6.3 Python Code

import pandas as pd

# Load the pre-cleaned iris dataset
df = pd.read_csv("data/iris_seaborn.csv")

# Preview the data
print(df.head())

# Confirm the shape
print("Rows and columns:", df.shape)

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
Rows and columns: (150, 5)

6.4 R Code

library(readr)

# Load the pre-cleaned iris dataset
df <- read_csv("data/iris_rbase.csv")

# Preview the data
head(df)

# A tibble: 6 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1          5.1         3.5          1.4         0.2 setosa 
2          4.9         3            1.4         0.2 setosa 
3          4.7         3.2          1.3         0.2 setosa 
4          4.6         3.1          1.5         0.2 setosa 
5          5           3.6          1.4         0.2 setosa 
6          5.4         3.9          1.7         0.4 setosa

# Dimensions
cat("Rows and columns:", dim(df), "\n")

Rows and columns: 150 5

✅ Once loaded, you’re ready to continue with data wrangling, visualization, or modeling — all based on your clean, reusable dataset.