Q&A 2 How do you install basic tools and libraries for Python and R?
2.1 Explanation
Before you can analyze data in Python or R, you need to install essential libraries. These libraries provide tools for data manipulation, visualization, statistical analysis, and machine learning — the four core layers in the CDI learning system. Installing them ensures you’re ready to explore datasets and build reproducible workflows.
2.2 Python Code
In your terminal or command prompt, run:
Then, import and check versions to confirm installation:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import scipy
from scipy import stats # still useful to import for use
print("pandas:", pd.__version__)
print("numpy:", np.__version__)
print("matplotlib:", plt.matplotlib.__version__)
print("seaborn:", sns.__version__)
print("scikit-learn:", sklearn.__version__)
print("scipy:", scipy.__version__)
pandas: 2.2.3
numpy: 2.2.6
matplotlib: 3.10.3
seaborn: 0.13.2
scikit-learn: 1.6.1
scipy: 1.15.3
Installed Python libraries by layer:
- 🧹 EDA:
pandas
– Tabular data structures and data cleaning tools
numpy
– Efficient array operations for numerical computing
- 📊 Visualization:
matplotlib
– Customizable static and interactive plots
seaborn
– Statistical data visualizations built on matplotlib
- 📐 Statistical Analysis:
scipy.stats
– Tools for distributions, t-tests, ANOVA, correlation, and more
- 🤖 Machine Learning:
scikit-learn
– Algorithms and utilities for classification, regression, clustering, model evaluation
2.3 R Code
# -----------------------------
# 📊 EDA (Exploratory Data Analysis)
# -----------------------------
if (!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)
# -----------------------------
# 📈 Visualization
# -----------------------------
if (!require(GGally)) install.packages("GGally")
library(GGally)
# -----------------------------
# 📐 Statistical Analysis (STATS)
# -----------------------------
if (!require(broom)) install.packages("broom")
library(broom)
if (!require(car)) install.packages("car")
library(car)
if (!require(emmeans)) install.packages("emmeans")
library(emmeans)
# -----------------------------
# 🤖 Machine Learning
# -----------------------------
if (!require(caret)) install.packages("caret")
library(caret)
Installed R packages by layer:
- 🧹 EDA:
tidyverse
– A collection of packages for tidy data workflows:dplyr
(data manipulation)
readr
(reading CSV and text files)
tidyr
(reshaping data)
tibble
(modern data frames)
stringr
(string operations)
forcats
(working with factors)
ggplot2
(visualization)
purrr
(functional programming)
- 📊 Visualization:
ggplot2
– Grammar of graphics for elegant visualizations (included in tidyverse)
GGally
– Enhances ggplot2 with matrix plots, correlation plots, etc.
- 📐 Statistical Analysis:
broom
– Converts model outputs into tidy data frames
car
– Tools for regression diagnostics, ANOVA, linear models
emmeans
– Estimated marginal means for post-hoc testing and comparisons
- 🤖 Machine Learning:
caret
– A unified framework for training, tuning, and comparing models across many algorithms
✅ Once these tools are installed, you’ll be ready to acquire datasets and begin your analysis.