Q&A 19 How do you create a simple dataset to test variable type conversion?

19.1 Explanation

You can create a small dataset manually to simulate variables of different types — such as character, integer, boolean, date, and categorical.

This is useful when:

  • Practicing how to convert between types (e.g., character → factor or object → datetime)
  • Testing how functions behave with different variable classes
  • Debugging type-specific behavior before applying to large datasets

Working with a controlled sample helps you understand how tools like pandas (Python) or the tidyverse (R) handle different types by default.

19.2 Python Code

# ✅ Import modern Python libraries
import pandas as pd
import numpy as np

# Create a test dataset with various types
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Carol"],                   # string / object
    "age": ["30", "25", "28"],                           # string (simulate raw import)
    "member": ["yes", "no", "yes"],                      # string (convert to bool/cat later)
    "joined": ["2022-01-01", "2021-07-15", "2023-03-20"] # string (convert to datetime)
})

# Save to CSV for testing workflows
df.to_csv("data/test_conversion.csv", index=False)

# Preview structure
print(df.dtypes)
print("\n", df.head())
name      object
age       object
member    object
joined    object
dtype: object

     name age member      joined
0  Alice  30    yes  2022-01-01
1    Bob  25     no  2021-07-15
2  Carol  28    yes  2023-03-20

19.3 R Code

# ✅ Load modern R packages
library(tidyverse)

# Create a test dataset
df <- tibble(
  name = c("Alice", "Bob", "Carol"),             # character
  age = c("30", "25", "28"),                     # character (simulate untyped input)
  member = c("yes", "no", "yes"),                # character (can convert to logical)
  joined = c("2022-01-01", "2021-07-15", "2023-03-20")  # character (convert to Date)
)

# Save to CSV for downstream testing
write_csv(df, "data/test_conversion.csv")

# Preview structure
glimpse(df)
Rows: 3
Columns: 4
$ name   <chr> "Alice", "Bob", "Carol"
$ age    <chr> "30", "25", "28"
$ member <chr> "yes", "no", "yes"
$ joined <chr> "2022-01-01", "2021-07-15", "2023-03-20"