DATASETS


Feel free to browse through this collection of small datasets generated using the Faker Python library and download any dataset that piques your interest. They can serve as a valuable tool to refine your skills.


Height and Weight dataset: this basic dataset provides a good starting point to explore distributions and correlations,


Business Data Collection: The below files are designed to work together. The customer and product files act as reference datasets, providing additional context and details to enrich the sales data


Movie Datasets : these datasets offer a wide range of information, including actors' names, directors, duration, release dates, and public ratings. By utilizing these datasets in your training, you can explore various aspects of data analysis within the context of the film industry. Whether you prefer a smaller dataset with 1000 rows or a larger one with 5,000 or 10,000 rows, you can choose the size that aligns with your specific learning objectives.


Real Estate Datasets : with information related to addresses,square footage, cities, prices. Two available sizes : 1000 rows and 10,000 rows

This is another real estate dataset, but with fewer variables, specifically designed for practicing multivariate regressions


Users Datasets : simulates a collection of user profiles for an e-commerce business, with information related to emails, passwords, last connexion dates, purchased products, amounts spent... Three available sizes : 1000 rows , 10,000 rows, 100,000 rows


Superconductor dataset : This dataset comprises a series of measurements of physical properties on a superconducting material. It includes :

These measurements provide insights into the behavior of the superconductor under different conditions. The dataset also captures the relationship between properties.


Shape recognition : explore machine learning using these datasets of triangles, circle, rectangle images. With this collection, you can train and fine-tune classification models for shape recognition.


Videos : This dataset contains a diverse collection of video data including titles, filenames, authors, durations, publication dates. you can analyze which feature has the most significant impact on the number of views


Collection of Real-World Datasets Resources

  • The official portal for European data
  • OECD data
  • Open Food Facts
  • NOAA Climate Data Online
  • NOAA global time series
  • US census bureau datasets
  • United Nations datasets
  • Monthly Bulletin of Statistics
  • World Bank data
  • International Monetary Fund - Download data
  • Ireland's Open Data Portal
  • Government of Spain - Open Data
  • Istituto Nazionale di Statistica
  • Open Data - France
  • Office for National Statistics - UK
  • Inegi - Mexico
  • INDEC - Argentina
  • Sloan Digital Sky Survey
  • Planetary Data System
  • Open Exoplanet Catalogue
  • NASA Exoplant Archive
  • Wikifact - dataset for training relationship classifiers
  • Open Data Covid 19
  • Goemotions - 58k comments from Reddit
  • Cartoon Set - 10k and 100k random cartoons