The IEC is developing PAS "Performance Requirements for LED Modules for General Lighting", in which the sample size of both the module power and luminous flux test items is 20, in addition to the judgment of the corresponding quality of the individual LED modules, the average of the corresponding samples is also used. The value is judged and the confidence interval and confidence level of the sample mean are given to qualify the overall judgment. This is rarely seen in previous IEC standards. Why can we use the sample mean to infer the population, which involves theories of mathematical statistics and probability theory. To this end, this article introduces the basic knowledge of mathematical statistics to interpret the statistical techniques involved in the PAS.
1. Several basic concepts of mathematical statistics
a) Population and Samples Generally, we refer to the whole of the subjects studied as the population, and the objects of each study as individuals. For example, a batch of LED modules constitutes a population, with each LED module being an individual in the population. Each LED has one or several quality indicators, such as module power, luminous flux, and so on. In order to study the population, we extracted a number of individuals from the population according to certain rules. These individuals are called samples (or subsamples). The number of individuals contained in the sample is called the sample size (or sample size). Each individual is called a sample.
For the whole, what we care about is not the individual individuals that make up the whole, but mainly the characteristics associated with them and the distribution of this feature. For example, to study the quality of LED modules, the concern is their power or luminous flux rather than the LED module itself. Since the power or luminous flux of any LED module is undeterminable, and each LED module does correspond to a power or luminous flux, we can think that the power or luminous flux of the LED module is a random variable, and we are concerned with this randomness. The probability distribution of the variables. In general, we can all think that the population in question is represented by a random variable, so that we can describe it in precise language: the population is a random variable that determines the probability distribution, and an individual Is an observation of a random variable. Therefore, the overall F(x) or population X, which means a random variable X with F(x) as a distribution function.
Once a given population is precisely defined, the sample can be accurately described. Since the value of each sample contained in the sample can be regarded as a random variable, a sample with a sample size of n can be considered as an n-dimensional random vector (X1, X2, ..., Xn). In order to make the sample more representative and to make the calculation as simple as possible, we naturally propose the following two requirements: First, each Xi (i = 1, ..., n) is required to have the same distribution F(x) as the overall X. Second, each Xi is independent of each other. We say that the samples satisfying these two requirements are simple random samples.
b) empirical distribution and theoretical distribution
The distribution function F(x) of the population X, that is, the distribution function of the population, is called the theoretical distribution or the overall distribution. The fractional function Fn(X) of a sample of size n is called the empirical distribution or the sample distribution.
Since there are a large number of individuals in the population, it is actually impossible or difficult to specifically measure each individual's indicators. Therefore, the overall distribution is objective but unknown. For example, it is known that the overall distribution is a normal distribution, but its expected μ and variance σ2 are unknown, but it is often known that the parameters that mark the overall distribution belong to a known set.
Glynenko's Glivenko theorem: the empirical distribution Fn(X) converges to F(x) with probability 1 on x, ie:
The Glivenko theorem tells us that when n is sufficiently large, Fn(X) and F(x) can be close enough, and the maximum value of their difference will also tend to zero as n increases. The outcome is probability 1 ( Inevitable events). Thus, when n is large enough, Fn(X) can be used to approximate F(x). This is the most basic theoretical basis for using the sample to infer the population.
a) Statistics
The sample is from the population and represents and reflects the population. The sample value obtained after the sample is extracted is an n-dimensional vector. It is not convenient to directly use the n observations. It should be processed and refined to collect the relevant information in the sample. Constructing a variety of different functions for a sample for different problems, this function is called a statistic in mathematical statistics. These statistics are: sample mean (see Equation 1.1), which can represent the positional characteristics of the sample. Sample variance (see Equation 1.2), which can represent the discrete characteristics of the sample.