Descriptive statistics, which develops the big picture of data, and inferential statistics, which infers the mechanism behind the data

Statistics covers a wide range of fields, from pure mathematics to more applied fields used for daily living, but roughly speaking, it deals with all kinds of data. Specifically, it is characterized by the assumption of probability behind data.

Methods of statistics are used even inside a robot vacuum cleaner that automatically cleans a room. The robot obtains data by measuring its surroundings with sensors. In doing so, its probable location data is expressed as a probability distribution. As it moves around for a while, more data accumulates, narrowing the probability distribution. This is an example of an application of statistics.

Statistics can be roughly divided into two categories: descriptive statistics and inferential statistics.

Descriptive statistics is a form of classical statistics that aims to understand overall trends by summarizing data. It includes finding the average annual income of each generation and how much the annual income varies in the national census. However, this type of census, which covers the entire population, is a special set of procedures, and the data is unusual to be used in statistics. Rather, the need to understand the tendency that something can be explained for the whole by extracting and analyzing a part (sample) from the whole leads to another type of inferential statistics.

Inferential statistics aims to infer mechanisms behind data and future outcomes through data. Let’s take pharmaceuticals, for example. It is not easy to ascertain how drugs work in the body to produce their effects. However, the results of whether the drug works or not can be verified by having many people actually take it. By collecting and statistically processing data on the drug’s efficacy in different patient groups, it is possible to verify whether or not the drug works in the first place and to create hypotheses that certain factors are involved in whether or not the drug works. Inferential statistics is used in various areas, including the drug approval process, economic analysis, psychology, and weather forecasting.

In this way, statistics has developed as a means of understanding the reality of society and the economy, and it is inseparable from the real world. On the other hand, theoretical and mathematical research is also actively conducted to grasp more complex subjects in detail. My research focuses on statistical estimation for inferring the appropriate model in inferential statistics, and information geometry, which provides a geometric perspective on the probability distribution of data.

Strong against targets with unknown mechanisms or variations

Statistics is often used to deepen our understanding of various phenomena and to obtain evidence for making decisions, but even so, I believe it is a tool that should be used first to create a starting point for understanding the subject. In that sense, as in the case of pharmaceuticals, its major strength is being able to understand phenomena and predict outcomes without knowing the inner workings of the subject.

Modeling plays an important role here. The set of data that is the basis of the statistics is represented, for example, as a scatter diagram with many points plotted on a chart. Simply put, if you can draw a line through a large number of points, that is, if you can represent the distribution of data as a graph of a mathematical expression, it means you have statistically extracted the characteristics of the data. The formula is a model, and setting up an appropriate model is called modeling in statistics.

To be honest, modeling is about imposing (or, at best, proposing) arbitrary interpretations of some data. It’s more like comparing multiple models and picking the right one for a particular purpose rather than finding a single right answer. For example, one of the major needs for statistics is to predict the future. In this case, it is theoretically known that a model slightly shifted in one direction will perform better than a model fitted to real data.

The idea of using statistics not to know the contents of a black box but to predict and control without knowing the contents seems to have become a standard idea around 1970 when new statistical methods appeared. On the other hand, research in the direction of clarifying the contents of the black box from data is also important and strongly rooted.

Another strength of statistics is its ability to quantitatively assess the uncertainty of things. When creating various systems and products, it is very important to determine the level of accuracy with which the target value can be achieved. For example, suppose you want to create a product that can produce a value of 100. If you have two prototypes, one of which always gives a value between 90 and 110 in tests, and the other varies from 0 to 200, you can statistically compare the data and say that the former has better performance, even if the average value is close to 100 in both prototypes. By treating such value dispersion as probability and evaluating it, we can grasp the uncertainty of things and contribute to safety and efficiency.

In recent years, improvements in computer performance have made it possible to handle a variety of models and large-scale data that could not be handled before. The robot cleaner mentioned at the beginning is an example of an application supported by such technology. The practical use of computers with very different principles, such as quantum computers, is also likely to advance statistics.

Statistics continues to evolve with big data, artificial intelligence, and technology

Statistics has always evolved with the latest technology to meet the needs for answering a variety of real-world problems. Methods for handling data are devised by humans, but whether they can actually perform calculations depends largely on the performance of computers of the time. At present, it is possible to perform almost any kind of process on an ordinary PC without using a supercomputer, and with the spread of smartphones and wearable devices, the number of data sources itself continues to increase. I believe there will be new problems that should be addressed by statistics one after another in areas that have thus far been beyond statistics.

And now the most interesting thing is the relationship between artificial intelligence and statistics. Artificial intelligence is bringing innovation to many fields, and in statistics, deep learning using neural networks (circuits of artificial intelligence that mimic human brain nerves) is proving to be quite promising, and is becoming a hot topic.

It is very important to examine how artificial intelligence handles statistical problems and why it can demonstrate high accuracy. Conventional statistical methods will not suffice to achieve this, and it will be necessary to introduce new mathematical theories or to establish entirely new ones. As we understand more about deep learning, the field of statistics should be advanced even more dramatically.

In addition, some researchers have begun basic research on quantum statistics using the characteristics of quantum mechanics. It is said that it will also have a huge impact if quantum statistics is commercialized along with quantum computers.

Along with these advances in technology, there is a growing need for statistics in important social contexts. In fields such as healthcare and policy-making, it is becoming commonplace to incorporate the results of analyses based on causal inference into decision making as evidence. Simultaneously, the most basic descriptive statistics will continue to be needed to make data beyond human comprehension easier to understand.

* The information contained herein is current as of May 2025.
* The contents of articles on Meiji.net are based on the personal ideas and opinions of the author and do not indicate the official opinion of Meiji University.
* I work to achieve SDGs related to the educational and research themes that I am currently engaged in.

Information noted in the articles and videos, such as positions and affiliations, are current at the time of production.