Q&A 16 What are common data types in Python and R?

16.1 Explanation

Understanding basic data types is crucial when working with data. Both Python and R offer a set of fundamental types to represent different kinds of values, which affect how data is stored, displayed, and processed.

Common Data Types:

Concept Python (pandas/base) R (base) Notes
Integer int integer Use astype(int) in pandas, as.integer() in R
Decimal Number float numeric, double numeric is usually stored as double in R
Text / String str, object (pandas) character Use astype(str) in pandas, as.character() in R
Logical / Boolean bool logical True/False in Python, TRUE/FALSE in R
Date / Time datetime64[ns] Date, POSIXct Use pd.to_datetime() in pandas, as.Date() or as.POSIXct() in R
Category category (pandas) factor Good for grouping and modeling
Missing Values NaN (numpy) NA Use pd.isna() in Python, is.na() in R
Complex Numbers complex complex Less common in typical data science
List list list R lists can contain different types, like Python lists
Dictionary dict named list, list() R lists with names can mimic Python dictionaries
Tuple tuple c(), list() No direct equivalent — use c() for vectors or list() for mixed types

Knowing these types helps with data cleaning, conversion, and model preparation.

16.2 Python Code

import pandas as pd

# Create a simple DataFrame to examine data types
df = pd.DataFrame({
    "name": ["Alice", "Bob"],                    # object (string)
    "age": [30, 25],                             # int64
    "joined": pd.to_datetime(["2022-01-01", "2021-07-15"])  # datetime64[ns]
})

# Display data types for each column
print(df.dtypes)
name              object
age                int64
joined    datetime64[ns]
dtype: object

16.3 R Code

# Create a simple data frame to examine R data types
df <- data.frame(
  name = c("Alice", "Bob"),                            # character
  age = c(30, 25),                                     # numeric (stored as double)
  joined = as.Date(c("2022-01-01", "2021-07-15"))      # Date
)

# Print structure of the data frame
str(df)
'data.frame':   2 obs. of  3 variables:
 $ name  : chr  "Alice" "Bob"
 $ age   : num  30 25
 $ joined: Date, format: "2022-01-01" "2021-07-15"