Q&A 4 What are Common Sources of Datasets for Python and R?
4.1 Explanation
Before you can analyze data, you need to get it. Python and R both provide built-in datasets and offer access to many high-quality public data sources online. These datasets are used for practice, learning, benchmarking, and real-world analysis.
In this question, we’ll look at:
- Built-in datasets available through standard libraries
- Trusted online sources for downloading CSV or Excel files
- How to access and load example datasets directly from Python or R
4.2 Built-in or Package-Based Datasets
These datasets are included in common libraries, so you can load them directly without needing to download files.
4.4 R
datasets package:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
ggplot2:
carat cut color clarity depth table price x y z 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
palmerpenguins (if installed):
# A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen NA NA NA NA 5 Adelie Torgersen 36.7 19.3 193 3450 6 Adelie Torgersen 39.3 20.6 190 3650 # ℹ 2 more variables: sex <fct>, year <int>
4.5 Online Public Data Sources
Source | Link |
---|---|
UCI Machine Learning Repo | https://archive.ics.uci.edu/ml/ |
Kaggle Datasets | https://www.kaggle.com/datasets |
data.gov (US Government) | https://www.data.gov |
Awesome Public Datasets | https://github.com/awesomedata/awesome-public-datasets |
World Bank Open Data | https://data.worldbank.org/ |
💡 Tip: Always save downloaded datasets in your
data/
folder and reference them using relative paths likedata/filename.csv
.
✅ Now that you know where to find data, let’s learn how to load and preview it in your Python or R environment.