Q&A 5 How do you save a dataset in Python and R?

5.1 Explanation

Once you’ve cleaned or prepared a dataset, it’s good practice to save it in a standard format like CSV. This allows you to:

Preserve your cleaned version for future use
Avoid repeating preprocessing steps
Share your data with others or load it in different tools

In this example, we’ll use sample datasets provided by libraries in Python and R, then save them into the data/ folder using to_csv() in Python and write_csv() in R.

5.2 Python Code

import pandas as pd
from sklearn import datasets
import seaborn as sns
import os

# Create data folder
os.makedirs("data", exist_ok=True)

# Save seaborn's iris dataset
df_iris = sns.load_dataset("iris")
df_iris.to_csv("data/iris_seaborn.csv", index=False)

# Save sklearn iris as well (optional)
iris_sklearn = datasets.load_iris(as_frame=True).frame
iris_sklearn.to_csv("data/iris_sklearn.csv", index=False)

print("Datasets saved successfully.")

print(iris_sklearn.head())

Datasets saved successfully.
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0

5.3 R Code

# Load necessary libraries
library(readr)
library(datasets)

# Create 'data/' directory if it doesn't exist
if (!dir.exists("data")) dir.create("data")

# Save the built-in iris dataset
write_csv(iris, "data/iris_rbase.csv")

# Optional: Save ggplot2's diamonds dataset if available
if (requireNamespace("ggplot2", quietly = TRUE)) {
  write_csv(ggplot2::diamonds, "data/diamonds.csv")
}

cat("Datasets saved successfully.\n")

Datasets saved successfully.

✅ After saving your cleaned or example dataset, you can now load it for further analysis or visualization in future sessions.