Sections of the site

Editor's Choice:

Advertising

home - Bach Richard

Characteristics of scattering of a random variable. Characteristics of the position of the center of grouping of random variables

For mathematical and statistical analysis of sample results, knowing only the position characteristics is not enough. The same average value can characterize completely different samples.

Therefore, in addition to them, statistics also consider scattering characteristics (variations, or fluctuations ) results.

1. Range of variation

Definition. In scope variation is the difference between the largest and smallest sample results, denoted by R and is determined

R=X max - X min.

The information value of this indicator is small, although with small sample sizes it is easy to assess the difference between the best and worst results of athletes.

2. Variance

Definition. Variance is called the average square of the deviation of the characteristic values from the arithmetic mean.

For ungrouped data, the variance is determined by the formula

Where X i– value of the attribute, - average.

For data grouped into intervals, the variance is determined by the formula

Where X i- average value i grouping interval, n i– interval frequencies.

To simplify calculations and to avoid calculation errors when rounding results (especially when increasing the sample size), other formulas are also used to determine the variance. If the arithmetic mean has already been calculated, then the following formula is used for ungrouped data:

 2 =
,

for grouped data:

These formulas are obtained from the previous ones by revealing the square of the difference under the sum sign.

In cases where the arithmetic mean and variance are calculated simultaneously, the formulas are used:

for ungrouped data:

 2 =
,

for grouped data:

3. Mean square(standard)deviation

Definition. Mean square (standard ) deviation characterizes the degree of deviation of the results from the average value in absolute units, since, unlike dispersion, it has the same units of measurement as the measurement results. In other words, the standard deviation shows the density of the distribution of results in a group around the mean value, or the homogeneity of the group.

For ungrouped data, the standard deviation can be determined using the formulas

 =
,

 =
or =
.

For data grouped into intervals, the standard deviation is determined by the formulas:

or
.

4. Error of the arithmetic mean (average error)

Arithmetic mean error characterizes the fluctuation of the average and is calculated by the formula:

As can be seen from the formula, with increasing sample size, the error of the average decreases in proportion to the square root of the sample volume.

5. Coefficient of variation

The coefficient of variation is defined as the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

It is believed that if the coefficient of variation does not exceed 10%, then the sample can be considered homogeneous, that is, obtained from one general population.

The main characteristics of dispersion used to assess the variation of values relative to the sample mean are dispersion, standard deviation, and coefficient of variation.

1. Dispersion(from lat. dispersio - scattering ) – arithmetic mean of the squared deviations of values x i from their arithmetic mean.

Dispersion (D)- a measure of dispersion (deviation from the average), is determined as follows: the arithmetic mean is subtracted from each option, the difference is squared and multiplied by the corresponding frequency. Next, determine the sum of all products and divide it by the volume of the population:

For grouped data, the variance is determined:

The dimension of dispersion does not coincide with the units of measurement of the varying characteristic.

When solving practical problems, in addition to using formulas for calculating sample variance, a quantity called variance corrected. The fact is that the value of the sample variance gives underestimated values in relation to the actual variance, therefore, with small samples (n< 30) необходимо применять исправленную дисперсию и среднеквадратическое отклонение :

2. Sampled and corrected standard deviation (σ, s) is the square root of the variance. The dimension of the standard deviation, in contrast to the dimension of dispersion, coincides with the units of measurement of experimental data, therefore it is mainly used to characterize the dispersion of the characteristic being studied.

Let us present the calculation of variance (Table 5) for example 1.

Table 5

Intermediate variance calculations

No.	Median values, x i	Class frequencies, n i
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
sum

The variance for the clustered example data is:

The standard deviation is correspondingly equal to:

The corrected standard deviation is:

Note that the formulas for calculating sample and corrected variances differ only in the denominators. For sufficiently large n, the sample and corrected variances differ little, so in practice the corrected variance is used if n< 30 .

3. Coefficient of variation (v)– is a relative measure of the dispersion of a characteristic, used as an indicator of the homogeneity of sample observations (Table 6).

The coefficient of variation is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage. In addition, the coefficient of variation is often used when comparing (comparing) the degree of variation of various characteristics expressed in different units of measurement.

To determine the nature of dispersion, the dimensionless coefficient of variation v is calculated using the formula:

where σ – standard deviation;

Arithmetic mean of sample data.

Ministry of Education and Science of the Russian Federation

State educational institution of higher professional education

"MATI" - Russian State Technological University named after K. E. Tsiolkovsky

Department of "Aircraft Engine Production Technology"

Laboratory workshop

MATLAB. Lesson 2

STATISTICAL ANALYSIS OF EXPERIMENTAL DATA

Compiled by:

Kuritsyna V.V.

Moscow 2011

INTRODUCTION........................................................ ........................................................ .......
CHARACTERISTICS OF RANDOM VARIABLES....................................................
Characteristics of the position of the center of grouping of random variables.....
Characteristics of scattering of a random variable....................................................
Characteristics of the sample of observations................................................................. .............
Normal distribution (Gaussian distribution) ....................................
PRESENTATION OF A SAMPLE OF MEASUREMENT RESULTS IN THE FORM
RANGE OF DISTRIBUTIONS................................................... ...................................
DETERMINATION OF STATISTICAL CHARACTERISTICS IN THE ENVIRONMENT
MATLAB ..................................................... ........................................................ .........
Formation of a sample of experimental data.................................................
Methods for generating a sample file................................................................... ....
Option 1. Formation of a data matrix of measurement results 12
Option 2. Simulation of measurement results....................................
Construction of distribution graphs................................................................... ..........
Option 1. Constructing distribution graphs....................................
Option 2. Constructing distribution graphs....................................
VISUAL MODELING................................................................. .............
Modeling in Matlab Simulink .................................................... ...................
Getting started with Simulink ................................................................. ................................
Creating a Simulink model................................................... ...............................
Formation of a sample for analysis................................................................. ..........
Calculation of statistical characteristics.........................................................
Construction of a distribution histogram....................................................
Block diagram of the visual model.................................................... ......................
Modeling a random process.................................................................. ..........
Model experiment........................................................ ............................
Creating arrays with random elements...................................................
Modifying a data source in a model................................................................. .
Approximate view of the model block diagram................................................. ..............

INTRODUCTION

In the arsenal of tools that a modern experimenter must possess, statistical methods of data processing and analysis occupy a special place. This is due to the fact that the result of any sufficiently complex experiment cannot be obtained without processing the experimental data.

The apparatus of probability theory and mathematical statistics has been developed and used to describe the patterns inherent in mass random events. Each random event is associated with a corresponding random variable (in this case, the result of the experiment).

The following characteristics are used to describe random variables:

A) numerical characteristics random variable (for example, mathematical expectation, variance, ...);

b) distribution law random variable – a function that carries all the information about the random variable.

Numerical characteristics and parameters of the distribution law of a random variable are interconnected by a certain dependence. Often, based on the value of numerical characteristics, one can assume the distribution law of a random variable.

Law of distribution of a random variable is usually called the probability distribution function of a random variable accepting a particular value. This is a function that associates possible interval values of a random variable with the probability of its falling into these intervals.

CHARACTERISTICS OF RANDOM VARIABLES

Characteristics of the position of the center of grouping of random variables

As numerical characteristics of the position of the center of grouping of random variables, the mathematical expectation or mean value, mode and median of the random variable are used (Fig. 3.1.).

Expected value random variable Y is denoted by M Y or a and is determined by the formula:

a = MY = ∫ Yϕ (Y ) dY .

The mathematical expectation indicates the position of the center of grouping of random variables, or the position of the center of mass of the area under the curve. The mathematical expectation is a numerical characteristic of a random variable, that is, it is one of the parameters of the distribution function.

ϕ (Y ϕ (Y)max


0 MoY
0 MoY
MеY

Rice. 3.1. Grouping characteristics of random variable X

The mode of a random variable Y is the value Mo Y in which the probability density has a maximum value.

The median of random Y is the value Me Y, which corresponds to the condition:

P(Y< МеY ) = P (Y >MeY ) = 0.5 .

Geometrically, the median represents the abscissa of the points on the line that divides the area enclosed by the probability density curve in half.

Characteristics of scattering of a random variable

One of the main characteristics of the scattering of a random variable Y around the center of the distribution is dispersion, which is denoted D(Y) or σ 2 and is determined by the formula:

D(Y ) = σ 2 = ∫ (Y − a) 2 ϕ (Y ) dY .

The variance has the dimension of the square of a random variable, which is not always convenient. Often, instead of variance, a positive value of the square root of the variance is used as a measure of the dispersion of a random variable, which is called standard deviation or standard deviation:

σ = D (Y) = σ 2.

Like dispersion, standard deviation characterizes the spread of a value around the mathematical expectation.

In practice, the dispersion characteristic called coefficient of variationν, which represents the ratio of the standard deviation to the mathematical expectation:

ν = σ a 100% .

The coefficient of variation shows how much dispersion there is compared to the mean of the random variable.

Characteristics of the observation sample

Average value the observed characteristic can be estimated using the formula

Y = 1 ∑ n Y i ,

n i = 1

where Yi is the value of the attribute in the i-th observation (experiment), i=1...n. ; n – number of observations.

Sample standard deviation determined by the formula:


		∑ (Yi − Y ) 2 .
	n − 1 i = 1

ν = Y S .

Knowing the coefficient of variation ν, you can determine the accuracy indicator H using the formula:

H = νn.

The more accurately the research is carried out, the lower the value of the indicator will be.

Depending on the nature of the phenomenon being studied, the accuracy of the study is considered sufficient if it does not exceed 3÷5%.

It is not uncommon for gross error. There are several ways to estimate gross errors. The simplest one is based on the calculation maximum relative deviation U. To do this, the measurement results are arranged in a series of monotonically increasing values. The smallest Y min or largest Y max member of the series is subject to check for gross error. The calculation is carried out using the formulas:


			− Y min			Y max − Y

The U value is compared with the table value for a given confidence probability U α. If U ≤ U α, then there is no gross error in this observation. Otherwise, the observation result is eliminated and

recalculate Y and S. Then the procedure for assessing and eliminating gross errors is repeated until the inequality U ≤ U α is satisfied for the extreme members of the series.

In many cases, the results of statistical observations can be described theoretical distribution laws. When interpreting data obtained experimentally, the task arises - to determine the theoretical law of distribution of a random variable that best corresponds to the observational results. More specifically, this task comes down to testing the hypothesis that a random sample belongs to a certain distribution law.

The analyzed processes, which are different in nature, determine the areas of application of different distribution laws. Thus, the result of a technological experiment under the same processing conditions is subject to completely different laws, and the result of an experiment on throwing a coin with heads and tails is subject to completely different laws. The laws of distribution of random variables of reliability characteristics and failures also have their own peculiarities.

TO basic statistical characteristics series of measurements (variational series) include position characteristics (average characteristics, or central tendency of the sample); scattering characteristics (variations or fluctuations) And X shape characteristics distributions.

TO position characteristics relate arithmetic mean (average value), fashion And median.

TO scattering characteristics (variations or fluctuations) relate: range of variation, dispersion, mean square (standard) deviation, arithmetic mean error (error of average), the coefficient of variation and etc.

To the characteristics of the form relate skewness coefficient, skewness measure and kurtosis.

Position Characteristics

Arithmetic mean– one of the main characteristics of the sample.

It, like other numerical characteristics of the sample, can be calculated both from raw primary data and from the results of grouping this data.

The accuracy of the calculation on raw data is higher, but the calculation process turns out to be labor-intensive with a large sample size.

For ungrouped data, the arithmetic mean is determined by the formula:

Where n- sample size, X 1 , X 2 , ... X n - measurement results.

For grouped data:

Where n- sample size, k– number of grouping intervals, n i– interval frequencies, x i– median values of the intervals.

Fashion

Definition 1. Fashion - the most frequently occurring value in the sample data. Designated Mo and is determined by the formula:

where is the lower limit of the modal interval, is the width of the grouping interval, is the frequency of the modal interval, is the frequency of the interval preceding the modal, is the frequency of the interval following the modal.

Definition 2. Fashion Mo discrete random variable its most probable value is called.

Geometrically, the mode can be interpreted as the abscissa of the maximum point of the distribution curve. There are bimodal And multimodal distributions. There are distributions that have a minimum but no maximum. Such distributions are called anti-modal .

Definition. Modal interval The grouping interval with the highest frequency is called.

Median

Definition. Median - the measurement result that is in the middle of the ranked series, in other words, the median is the value of the attribute X, when one half of the experimental data values is less than it, and the second half is greater, is designated Meh.

When sample size n- an even number, i.e. there is an even number of measurement results, then to determine the median, the average value of two sample indicators located in the middle of the ranked series is calculated.

For data grouped into intervals, the median is determined by the formula:

where is the lower limit of the median interval; grouping interval width, 0.5 n– half of the sample volume, – frequency of the median interval, – accumulated frequency of the interval preceding the median.

Definition. Median interval is the interval in which the accumulated frequency for the first time turns out to be more than half the sample volume ( n/ 2) or the accumulated frequency will be greater than 0.5.

The numerical values of the mean, mode, and median differ when there is an asymmetrical shape of the empirical distribution.

Dispersion characteristics of measurement results

For mathematical and statistical analysis of sample results, knowing only the position characteristics is not enough. The same average value can characterize completely different samples.

Therefore, in addition to them, statistics also consider scattering characteristics (variations, or fluctuations ) results.

Range of variation

Definition. In scope variation is the difference between the largest and smallest sample results, denoted by R and is determined

R=X max - X min.

The information value of this indicator is small, although with small sample sizes it is easy to assess the difference between the best and worst results of athletes.

Dispersion

Definition. Variance is called the average square of the deviation of the characteristic values from the arithmetic mean.

For ungrouped data, the variance is determined by the formula

s 2 = , (1)

Where Xi– the value of the attribute, is the arithmetic mean.

For data grouped into intervals, the variance is determined by the formula

Where x i- average value i grouping interval, n i– interval frequencies.

for grouped data:

These formulas are obtained from the previous ones by revealing the square of the difference under the sum sign.

The dispersion of a random variable characterizes its spread relative to the point of mathematical expectation. Since the scattering of elements of the spectrum of a random variable occurs on both sides of the scattering center, to take it into account, either even degrees of central moments or absolute central moments are used. It is enough to consider the central moment of the second order m 2 and the absolute central moment of the first order t 1. The first one is called dispersion , and second - average deviation . Let's study them in more detail.

Dispersion random variable X has several designations:

– DSV;

D( X) = = m 2 = E( 2) = (59)

– NSV,

Variance operator D has the following properties:

1) D(C) = 0

2) D(CX) = C 2 · D(X) . (60)

3) D(C+X) = D(X)

The situation with the proof of the properties of the dispersion operator is similar to that noted for the mathematical expectation operator. Let us dwell on the physical meaning of these properties.

First property says that a constant value has no spread. No comment required.

When changing the scale along the x-axis ( second property ), the new variance value is obtained from the old one by multiplying the latter by the square of the scale factor.

Third property dispersion is that when moving the origin of coordinates by an amount C along the abscissa axis, the variance of the random variable does not change, since centering compensates for the transfer.

The combination of these properties expresses the response of the dispersion operator to a linear transformation of a random variable X :

D( C 1 + C 2 ∙ X) = C 2 2 ∙ D(X) . (61)

From the definition of variance it follows that its dimension is equal to the square of the dimension of the random variable that it characterizes. This is not always easy to understand. For example, if we say that some distance S= 567.89 m, and its variance D(S) = 9∙10 -4 m 2, then the comparison of these quantities having different dimensions , does not give an idea of the accuracy of the measurements. This fact contributed to the use of another indicator as a characteristic of dispersion - standard .

Standard or standard deviation (RMS) represents positive value square root of the variance and characterizes spread of SW relative to its center of dispersion in the same units in which the random variable itself is expressed:

(62)

The properties of the standard are determined by the properties of the dispersion:

1) s C = 0

2) s CX = C·s X (63)

3) s C + X= s X

If we now characterize the previously given distance S=567.89 m standard s S =3*10 -2 m, then our idea of the accuracy of this distance will be adequate.

Average deviation is the absolute first order central moment of the random variable X , denoted by the letter ϑ X and calculated by definition (58) at r = 1 :

– DSV;

ϑ X= τ 1 = E(| |)= (64)

– NSV.

Properties average deviation similar to the properties of the standard (check this quality Exercises 2.1):

1) ϑ X = 0

2)ϑ CX = |C|·ϑ X (65)

3) ϑ C + X = ϑ X

2.2.6 Examples of one-dimensional distributions.

Let us consider the distribution laws of some discrete and continuous random variables that play an important role in theory and practice.

Event indicator.

Event indicator I A represents a special case of Bernoulli tests. This is a discrete random variable taking only two possible values 0 And 1 with probabilities ( 1 – p ) And p respectively. Here p = P(A) – probability of an event occurring A , described on some space W. Let us consider all the characteristics introduced above for this random variable as an example and for the purpose of using them when studying more complex laws.

Given:X = I A = {x 1 = 0; x 2 = 1} ; P(x 1) = P(Ā ) = 1 – p =q ; P(x 2) = P(A) = p.

Find: 1) F(I A) – ? 2) E(I A) – ? 3) D(I A) – ? 4) s I – ?

Solution:

1) We will place the distribution function in the extended table of the distribution series, as proposed in (44):

X = I A			-
P( X = I A)	q	p	-
F( I A)		q

We determine the numerical characteristics using formulas (51), (59) and (62):

2)E(I A) = 0∙q + 1∙p = p ;

3)D(I A) = =a 2 - = 0 2 ∙ q+1 2 ∙p – p 2 = p∙(1 – p) = pq ;

4) = .

The event indicator is used in the study of repeated trials and solving other problems as an auxiliary random variable.

2.2.6.2 Uniform distribution.

As an illustration to explain the material in this section 2.2 for continuous random variables, we study continuous uniform distribution on some segment [ a; b ]. The distribution is called uniform on a segment, if it density probabilities constant on this segment and is equal to zero outside it. Let us imagine studying this distribution in the form of solving a problem.

Given: f(x) = c , [a; b] ; f(x) = 0 outside this segment.

Find: 1 ) constant distribution density c – ?, 2 ) F(x) – ?, 3 )E(X) – ?, 4 ) Mo( X) – ?, 5 ) Me( X) – ?, 6 ) D(X) – ?, 7 ) s X – ?, 8 ) ϑ X – ?, 9 )P(x 1 <X<x 2) – ?

Solution: Execute yourself as Exercises 2.2.

Answers: 1 ) c = 1 / (b – a) ; 2 ) F(x) = (x – a) / (b – a) ; 3 ) E(X) = (a + b)/2 ;

4 ) Mo( X) - not determined; 5 ) Me( X) = E(X) ; 6 ) D(X) = (b – a) 2 / 12 ;

7 ) s x = (b – a) /() ;8 ) ϑ X = (b – a) / 4 ; 9 ) P(x 1 < X < x 2) = (x 2 – x 1)/(b – a) , When ]x 1 ; x 2 [ [a;b] .

Density graphs and uniform distribution functions are presented in the following figures ( Fig.19 And 20 ).

f(x) F(x)

S=1 c=1/

0 a E(X) b X 0 a E(X) b X

Rice. 2.19 Density of uniform Fig. 2.20 Uniform function

Read:

Lesson derivative of an exponential function number e Presentation - reasons, cost and meaning of a great victory Tsar Peas See what “Under the Tsar Peas” is in other dictionaries Unified State Exam in Russian: presentations on all test tasks Physical properties of Nickel