Please look over these notes before you begin. Jump to the data sources here.
For further assistance, email libdata@uwindsor.ca or come to the data centre during open hours.
Types of data
There are two main types of data you will be using. Time series data is data that has repeated observations on something such as a country or region over time - for example, Canada's GDP annually. Cross-sectional data is data on multiple units at the same time - for example, a survey where 500 people were asked the same questions, or the population of 100 countries in 2003.
How many observations
You need many observations to conduct an analysis. For a bivariate analysis (an analysis with two variables where you use one variable to predict the other) 30 observations may be enough. 20 is iffy, 10 ridiculous.
While it is mathematically possible to run statistical procedures with fewer observations, you are unlikely to get useful results, and your professor is unlikely to accept your paper.
With annual time series data it can be difficult to get enough observations - many time series don't go back more than 20 years or so. However, some time series are available monthly, giving you 120 observations in 10 years, which is plenty. With cross-sectional data you need enough units - data on Canada's 10 provinces would not be enough, but data on 100 Canadian cities would.
Frequently data - particularly international data - will have missing observations.
Combining data
Sometimes you can combine data from different sources to get the variables you need. With time series data you need to make sure the time periods match up. With cross-sectional data on countries or regions this can also work. With survey data this is not possible, as there is no way to make different groups of people match up.
Environmental and scientific data
-
International Environmental Performance Index 2010.
All the environmental indicators you could possibly want, by country. - Canadian Energy Use Data from the Office of Energy Efficiency
-
U.S. Environmental Protection Agency data
Air and water quality, pollutant emissions, hazardous waste, etc. - Sunspots - Solar Influence Data Centre
Automotive, transport, related
- Car Fuel Economy Data from the U.S.Dept. of Energy
- Ward's Key Automotive Data
- Statistical Handbook - Canadian Association of Petroleum Producers
- Aerospace Industries Association Data
- Canadian Road Safety Statistics and Reports
Finance and economic data
-
World Bank Development Data
Data on countries: economics, living standards, health - Economagic - Economic Time Series Data
- U.S. Bureau of Economic Analysis Data
- FRED - Federal Reserve Economic Data
-
Dow Jones Historic Indexes
Stock market data. Register (free) for access.
Miscellaneous
-
Scientists and Engineers Statistical Data System
Employment, educational, and demographic characteristics of scientists and engineers in the United States. -
CANSIM - Statistics Canada's Time Series database
Time series data, on Canada and provinces: economic, elections, social and health, and environmental sources. Be careful when using this data - some series have enough observations for an analysis, others do not. -
Sports Statistics
Links from the American Statistical Association. -
Canada's Best Places to Live
Data comparing 154 communities on a variety of factors. -
UN Office on Drugs and Crime
International statistics on drugs, organized crime, human trafficking, etc. -
Statistical Abstract of the United States
Lots of stuff on the U.S.: tax revenue, crime, sports, business and industry. -
CDC Wonder
Data from the U.S. Centers for Disease Control and Prevention. -
U.N. Databases
Assorted international data. Includes some environmental as well as industrial, economic and population. -
Religion Data Archive
See particularly QuickStats and QuickLists
Archives of datasets specifically prepared for teaching and learning
These datasets are usually ready-to-use but may be limited in scope. Most include some data with an engineering or experimental focus.
- Time Series Data Library topics include electrical usage, manufacuturing processes, chemistry, finance, etc.
- UCI Machine Learning Repository includes the Challenger shuttle O-Ring failure data, web usage/tracking, robots, OCR
- ICPSR Instructional Data Modules includes social, political, criminological and other data
- The Data and Story Library from Carnegie-Mellon
- Australasian Data and Story Library with a focus on Asian and Australian data
- Journal of Statistics Education Teaching Data Archive including the famous "cars" datasets
- University of Massachusets Amherst teaching data archive includes several medical experiment datasets
- UCLA Statistics Data Sets has some prepared datasets under "Datasets for teaching"
Still looking? Email libdata@uwindsor.ca or come to the data centre during open hours.