Data Science Foundations
Welcome
I GETTING STARTED
Setting Up Your Analysis Environment
Explanation
Who This Guide Is For
Install Python
Install R
Install RStudio
Install Visual Studio Code (VSCode)
Installing Extensions in VSCode
Verify Installation
How to Navigate This Guide
Best Practices for Using Python & R Side by Side
✅ Run Each Language Independently
✅ Modify and Experiment
✅ Compare Results
✅ Use the Same Dataset
What’s Next?
1
How do you create a project directory ready for analysis?
1.1
Explanation
1.2
Bash (Terminal)
1.3
Python Code
1.4
R Code
2
How do you install basic tools and libraries for Python and R?
2.1
Explanation
2.2
Python Code
2.3
R Code
3
What are common sources of datasets for Python and R?
3.1
Explanation
4
What are Common Sources of Datasets for Python and R?
4.1
Explanation
4.2
Built-in or Package-Based Datasets
4.3
✅ Python
4.4
R
4.5
Online Public Data Sources
4.6
Python Code
5
How do you save a dataset in Python and R?
5.1
Explanation
5.2
Python Code
5.3
R Code
6
How do you load a pre-cleaned dataset in Python and R?
6.1
Explanation
6.2
Using consistent file paths (like
data/*.csv
) ensures reproducibility across environments.
6.3
Python Code
6.4
R Code
7
How do you rename column names in Python and R?
7.1
Explanation
7.2
Python Code
7.3
R Code
8
How do you examine the structure and types of variables in Python and R?
8.1
Explanation
8.1.1
✅ Common Data Types in Python and R
8.2
Python Code
8.3
R Code
9
How do you check for missing values in Python and R?
9.1
Explanation
9.2
Python Code
9.3
R Code
10
How do you get summary statistics for numeric variables in Python and R?
10.1
Explanation
10.2
Python Code
10.3
R Code
11
How do you filter rows based on a condition in Python and R?
11.1
Explanation
11.2
Python Code
11.3
R Code
12
How do you sort rows based on a variable in Python and R?
12.1
Explanation
12.2
Python Code
12.3
R Code
13
How do you create a new variable in Python and R?
13.1
Explanation
13.2
Python Code
13.3
R Code
14
How do you detect and remove duplicate rows in Python and R?
14.1
Explanation
14.2
Python Code
14.3
R Code
15
How do you export a cleaned dataset in Python and R?
15.1
Explanation
15.2
Python Code
15.3
R Code
General Data Science EDA Summary
🧱 What You’ve Accomplished
📈 Up Next: Data Visualization (VIZ)
📚 Your CDI Learning Path
🚀 Continue Learning
II DATA VISUALIZATION
16
What are common data types in Python and R?
16.1
Explanation
16.2
Python Code
16.3
R Code
17
What is the difference between categorical and numerical variables?
17.1
Explanation
17.1.1
🔷 Categorical Variables
17.1.2
🔶 Numerical Variables
18
How do you inspect variable types in a dataset?
18.1
Explanation
18.2
Python Code
18.3
R Code
19
How do you create a simple dataset to test variable type conversion?
19.1
Explanation
19.2
Python Code
19.3
R Code
20
How do you convert variable types in a dataset?
20.1
Explanation
20.2
Python Code
20.3
R Code
21
How do you summarize numerical and categorical variables?
21.1
Explanation
21.2
Python Code
21.3
R Code
22
How do you visualize the frequency of categorical variables?
22.1
Explanation
22.2
Python Code
22.3
R Code
23
How do you visualize distributions of numerical variables?
23.1
Explanation
24
How do you use a histogram to visualize numerical distributions?
24.1
Explanation
24.2
Python Code
24.3
R Code
25
How do you visualize smooth distributions using density plots?
25.1
Explanation
25.2
Python Code
25.3
R Code
26
What are the best plots to compare groups across a categorical variable?
26.1
Explanation
27
How do you use boxplots to compare groups across a categorical variable?
27.1
Explanation
27.2
Python Code
27.3
R Code
28
How do you use violin plots to compare groups across a categorical variable?
28.1
Explanation
28.2
Python Code
28.3
R Code
29
How do you use ridge plots to compare distributions across a categorical variable?
29.1
Explanation
29.2
Python Code
29.3
R Code
30
How do you visualize individual data points by group using a strip plot?
30.1
Explanation
30.2
Python Code
30.3
R Code
31
How do you visualize individual observations using a swarm plot?
31.1
Explanation
31.2
Python Code
31.3
R Code
32
How do you visualize group summaries using a dot plot?
32.1
Explanation
32.2
Python Code
32.3
R Code
33
How do you visualize group summaries using bar plots with error bars?
33.1
Explanation
33.2
Python Code
33.3
R Code
34
How do you visualize relationships between two numerical variables using a scatter plot?
34.1
Explanation
34.2
Python Code
34.3
R Code
35
How do you visualize trends across an ordered variable using a line plot?
35.1
Explanation
35.2
Python Code
35.3
R Code
III PATTERN RECOGNITION AND RELATIONSHIPS
36
How do you visualize patterns and relationships in multivariate data?
36.1
Explanation
37
How do you visualize all pairwise relationships using a pair plot?
37.1
Explanation
37.2
Python Code
37.3
R Code
38
How do you visualize correlation between numerical variables?
38.1
Explanation
38.2
Python Code
38.3
R Code
39
How do you visualize the relationship between two numerical variables using a scatter plot?
39.1
Explanation
39.2
Python Code
39.3
R Code
40
How do you enhance scatter plots by adding group color and trend lines?
40.1
Explanation
40.2
Python Code
40.3
R Code
IV SPECIALTY VISUALS
41
How do you visualize simple proportions using a pie chart?
41.1
Explanation
41.2
Python Code
41.3
R Code
42
How do you create a donut chart to show part-to-whole proportions?
42.1
Explanation
42.2
Python Code
42.3
R Code
43
How do you visualize hierarchical part-to-whole relationships using a treemap?
43.1
Explanation
43.2
Python Code
43.3
Note
44
How do you create a static treemap in Python?
44.1
Explanation
44.2
Python Code
44.3
R Code
45
How do you visualize overlaps using a Venn diagram?
45.1
Explanation
45.2
Python Code
45.3
R Code
How do you choose the right visualization for your data?
Explanation
What domain-specific visualizations should I learn?
Explanation
V REFERENCES
Core References for Further Learning
📘 General & Framework
🐍 Python Tools
🅡 R Ecosystem
📊 Statistical Learning
Full Linked References
Explore More Guides
General Data Science – Free Edition
General Data Science – Free Edition
Last updated: June 10, 2025