Q&A 8 How do you examine the structure and types of variables in Python and R?
8.1 Explanation
Understanding the structure of your dataset — including data types — is a key step in exploratory data analysis. It helps you:
- Know what transformations are needed
- Identify categorical vs. numerical variables
- Prepare your data for modeling or visualization
Each column in your dataset has a specific data type. These types influence how operations behave, how memory is allocated, and how functions treat your data.
8.1.1 ✅ Common Data Types in Python and R
Concept | Python (pandas ) |
R (base ) |
Notes |
---|---|---|---|
Integer | int |
integer |
Use astype(int) or as.integer() |
Decimal Number | float |
numeric , double |
numeric in R defaults to double |
Text / String | str , object |
character |
Use astype(str) or as.character() |
Logical / Boolean | bool |
logical |
True /False in Python, TRUE /FALSE in R |
Date / Time | datetime64[ns] |
Date , POSIXct |
Use pd.to_datetime() or as.Date() |
Category | category |
factor |
Useful for grouping and modeling |
Missing Values | NaN (numpy ) |
NA |
Use pd.isna() or is.na() |
Complex Numbers | complex |
complex |
Rare in typical EDA workflows |
List | list |
list |
R lists allow mixed data types |
Dictionary | dict |
named list |
R lists with names can mimic Python dictionaries |
Tuple | tuple |
c() , list() |
No direct equivalent; use vectors or lists in R |
8.2 Python Code
import pandas as pd
# Load the standardized dataset
df = pd.read_csv("data/iris.csv")
# View column names
print("Column names:", df.columns.tolist())
# Check data types
print("\nData types:")
print(df.dtypes)
# Optional: Use .info() for a more detailed summary
print("\nStructure info:")
df.info()
Column names: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
Data types:
sepal_length float64
sepal_width float64
petal_length float64
petal_width float64
species object
dtype: object
Structure info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
8.3 R Code
library(readr)
# Load the standardized dataset
df <- read_csv("data/iris.csv")
# View column names
names(df)
[1] "sepal_length" "sepal_width" "petal_length" "petal_width" "species"
spc_tbl_ [150 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ sepal_length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ sepal_width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ petal_length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ petal_width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ species : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
- attr(*, "spec")=
.. cols(
.. sepal_length = col_double(),
.. sepal_width = col_double(),
.. petal_length = col_double(),
.. petal_width = col_double(),
.. species = col_character()
.. )
- attr(*, "problems")=<externalptr>
sepal_length sepal_width petal_length petal_width species
"numeric" "numeric" "numeric" "numeric" "character"
✅ Once you’re familiar with variable types, you can decide how to clean, filter, or transform your data — and which variables are ready for plotting or modeling.