Q&A 3 What are common sources of datasets for Python and R?
3.1 Explanation
Before working with data, it’s important to know where data comes from. In both Python and R, you can use:
- Public datasets from libraries or platforms
- Downloaded datasets from repositories
- Real-world data from research, surveys, APIs, or government sources
These sources help you practice data skills using real, structured information.
Common sources include:
- Built-in datasets:
- Python:
seaborn
,sklearn.datasets
,statsmodels
,pydataset
- R:
datasets
package,MASS
,ggplot2
,palmerpenguins
- Python:
- Online repositories:
- Research & Surveys:
- CSV/Excel/JSON files published with academic papers or institutions
- Survey data from organizations (e.g., Pew Research, Eurostat)
- CSV/Excel/JSON files published with academic papers or institutions
- APIs and live feeds:
- Weather, financial markets, genomics, social media (e.g., Twitter API)
- Local files:
- Saved from tools like Excel, Google Sheets, SPSS, or exported from databases
Once you acquire a dataset, you can load, clean, explore, and transform it in Python or R.