Statistics is the science concerned with the collection and analysis of numerical information to answer questions wisely. The term also refers to the numerical information that has been collected. Statistics has many applications in Canada, from government censuses and surveys, to decision making in industry, to medical research and technological innovation.
Goals and Applications of Statistics
It is common to think of statistics as collections of numbers, e.g., data or facts in numerical form, such as birth rates, death rates, amounts of rainfall, oil reserves and hockey records. However, numbers alone have little significance. To be meaningful, they must be placed in context. Statisticians work with numbers, but their goals are ambitious: insight, discovery, confirmation, explanation, prediction, control and decision.
Did you know?
Humorist Stephen Leacock wrote: “In earlier times, they had no statistics, and so they had to fall back on lies. Hence the huge exaggerations of primitive literature — giants or miracles or wonders! They did it with lies and we do it with statistics; but it is all the same.” Thomas Chandler Haliburton’s picaresque character Sam Slick stated: “Figures are the representatives of numbers, and not things.”
Statistics is an application of science, a tool of commerce and industry, and an instrument of government. Science is characterized by the scientific method, which involves formulating theories, deducing consequences and verifying predictions. Analysis of data leads to the formulation of theories. Verification involves checking facts against predictions. These stages correspond to important subfields of statistics such as exploratory data analysis (which often uses visual or graphical representations to summarize the main characteristics of data sets) and hypothesis testing.
Commerce and industry are concerned with efficient use of resources and with decision making in the face of uncertainty. Statisticians have developed methods for the efficient design of investigations, be they surveys, observational studies or experiments. The subfield of decision theory concerns making effective choices in uncertain situations.
Finally, government has a responsibility to know the status and protect the well-being of its people. This responsibility may be realized, in part, through censuses, continuing surveys, data banks and forecasts, each of which has major statistical components. Statistics Canada is the federal government agency that carries out this work.
The Jean Talon Building is part of Statistics Canada's headquarters in Tunney's Pasture, Ottawa. Photo taken in 2008. (Courtesy Demetri1968/Wikimedia CC)
Statistics depends on certain basic concepts, including sample, stratification, randomization, replication, stochastic modelling and goodness of fit. Using these concepts, statisticians and scientists have often been able to provide provable and replicable answers to important questions. These are all major contributions to human knowledge.
The sample is a collection of objects or individuals meant to represent a larger collection (e.g., the population). The innovation of statisticians was the recognition that, if objects were selected randomly from a population of interest, those selected (the sample) would be representative of that population. Statisticians also recognized that measures of error inherent in using a sample could be computed.
Stratification is the grouping of objects into collections of similar objects before selecting a sample or experimenting on the objects. For example, students at school might be grouped by grade, followed by the selection of separate samples for each grade.
Randomization was a scientific breakthrough made by 20th-century British statistician R. A. Fisher. For example, to discover which of two methods of language instruction is better, an educator might use a randomized experiment. Starting with a group of similar students, the educator would randomly choose half to experience the first method of instruction. Those remaining would experience the second method. At the end of the experiment, the test scores of the two groups would be compared. The random division of the students makes it very unlikely that the brighter students would all be taught by just one method and thereby bias the results.
Replication is the act of repeating a measurement of interest. In the instructional experiment just referred to, an example of replication would be repeating the study with several groups of students. Using replication, researchers can make better estimates of quantities of interest and can better compute the error of the estimates.
The stochastic model is a simplified description of a circumstance in mathematical language (e.g., equations) wherein a variable that is being measured has some element of randomness, as opposed to having a deterministic, or fixed, outcome. Stochastic models lead to effective summarization and analysis of complex circumstances.
Goodness of Fit
Goodness of fit refers to the study of procedures for assessing how well a given stochastic model describes a particular collection of data. It is part of the verification of the model’s predictions. British statistician Karl Pearson, whose main work just preceded that of R. A. Fisher, introduced the chi-square test as an automatic procedure for goodness of fit. Researchers use this test to compare expected and observed values. Today, a wealth of formal and informal methods is available. Statistics makes substantial use of the concepts of mathematics and of a broad variety of substantive fields.
Statistics in Canada
The statistical history of Canada began more than three centuries ago. The first systematic census (e.g., complete enumeration of a population) was carried out in New France in 1665–66 for King Louis XIV by Intendant Jean Talon. The documents Talon prepared are in the National Archives of Canada in Ottawa. The first nationwide census took place in 1871. The census is now the responsibility of Statistics Canada, formerly the Dominion Bureau of Statistics (established in 1918).
Some Canadian universities have separate statistics departments (e.g., British Columbia, Manitoba, Toronto, Waterloo). Others have joint departments or have kept statistics within mathematics departments. There are departments that include biostatistics (e.g., McGill, McMaster, Toronto, Western). Statistics often forms part of the curriculum of subjects that make use of quantitative techniques (e.g., in economics departments, schools of business and commerce, and various physical, social and biological sciences).
The Canadian statistics profession is represented by the Statistical Society of Canada (established in 1978), which publishes The Canadian Journal of Statistics. The society has annual meetings and elects honorary members who have made important contributions to the field. These include C. H. Goulden, who used experimental design to overcome rust fungus in grain and wrote an important early textbook, Methods of Statistical Analysis (1939).
The federal government supports statistical science through the Natural Sciences and Engineering Research Council. In Quebec, the provincial government’s Fonds de recherche also supports the field. Canadian statistics has a clear presence on the international scene, from the many invitations Canadian researchers receive from abroad, to the hosting of important international meetings in Canada.
Advances in computer technology since the mid-20th century have led to new applications of statistics (see Computer Science). Researchers can now analyze massive amounts of data, find patterns in these data, and make predictions based on them. Climate science, for example, uses large data sets to predict changes in global average temperatures and the impacts of climate change on ice sheets (see also Climate Information). Current applications of artificial intelligence technologies such as machine learning also rely on the collection and analysis of so-called “big data” — for instance, in the development of self-driving cars.
Statistical meta-analysis is increasingly used in scientific research. This method combines the findings of multiple separate studies to help researchers better understand their results. Meta-analyses are common in medicine, where they have been used to make conclusions about the efficacy of medications and other health interventions.