Q&A 8 How do you examine the structure and types of variables in Python and R?

8.1 Explanation

Understanding the structure of your dataset — including data types — is a key step in exploratory data analysis. It helps you:

  • Know what transformations are needed
  • Identify categorical vs. numerical variables
  • Prepare your data for modeling or visualization

Each column in your dataset has a specific data type. These types influence how operations behave, how memory is allocated, and how functions treat your data.


8.1.1 ✅ Common Data Types in Python and R

Concept Python (pandas) R (base) Notes
Integer int integer Use astype(int) or as.integer()
Decimal Number float numeric, double numeric in R defaults to double
Text / String str, object character Use astype(str) or as.character()
Logical / Boolean bool logical True/False in Python, TRUE/FALSE in R
Date / Time datetime64[ns] Date, POSIXct Use pd.to_datetime() or as.Date()
Category category factor Useful for grouping and modeling
Missing Values NaN (numpy) NA Use pd.isna() or is.na()
Complex Numbers complex complex Rare in typical EDA workflows
List list list R lists allow mixed data types
Dictionary dict named list R lists with names can mimic Python dictionaries
Tuple tuple c(), list() No direct equivalent; use vectors or lists in R

8.2 Python Code

import pandas as pd

# Load the standardized dataset
df = pd.read_csv("data/iris.csv")

# View column names
print("Column names:", df.columns.tolist())

# Check data types
print("\nData types:")
print(df.dtypes)

# Optional: Use .info() for a more detailed summary
print("\nStructure info:")
df.info()
Column names: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

Data types:
sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

Structure info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

8.3 R Code

library(readr)

# Load the standardized dataset
df <- read_csv("data/iris.csv")

# View column names
names(df)
[1] "sepal_length" "sepal_width"  "petal_length" "petal_width"  "species"     
# Check data types (structure)
str(df)
spc_tbl_ [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ sepal_length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ sepal_width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ petal_length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ petal_width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
 - attr(*, "spec")=
  .. cols(
  ..   sepal_length = col_double(),
  ..   sepal_width = col_double(),
  ..   petal_length = col_double(),
  ..   petal_width = col_double(),
  ..   species = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
# Optionally print class of each variable
sapply(df, class)
sepal_length  sepal_width petal_length  petal_width      species 
   "numeric"    "numeric"    "numeric"    "numeric"  "character" 

✅ Once you’re familiar with variable types, you can decide how to clean, filter, or transform your data — and which variables are ready for plotting or modeling.