The values of the empirical distribution function plot its graph. Empirical distribution function, properties. Properties of the empirical distribution function

Site sections

Editor's Choice:

Advertising

home - Vitale Joe

Determination of the empirical distribution function

Let $X$ be a random variable. $F(x)$ - distribution function of the given random variable. We will carry out $n$ experiments on a given random variable under the same independent conditions. In this case, we obtain a sequence of values $x_1,\ x_2\ $, ... ,$\ x_n$, which is called a sample.

Definition 1

Each value of $x_i$ ($i=1,2\ $, ... ,$ \ n$) is called a variant.

One of the estimates of the theoretical distribution function is the empirical distribution function.

Definition 3

The empirical distribution function $F_n(x)$ is the function that determines for each value $x$ the relative frequency of the event $X \

where $n_x$ is the number of options less than $x$, $n$ is the sample size.

The difference between an empirical function and a theoretical one is that the theoretical function determines the probability of the event $X

Properties of the empirical distribution function

Let us now consider several basic properties of the distribution function.

The range of the function $F_n\left(x\right)$ is the segment $$.

$F_n\left(x\right)$ is a non-decreasing function.

$F_n\left(x\right)$ is a left continuous function.

$F_n\left(x\right)$ is a piecewise constant function and increases only at points of values of the random variable $X$

Let $X_1$ be the smallest, and $X_n$ be the largest variant. Then $F_n\left(x\right)=0$ for $(x\le X)_1$ and $F_n\left(x\right)=1$ for $x\ge X_n$.

Let us introduce a theorem that connects the theoretical and empirical functions.

Theorem 1

Let $F_n\left(x\right)$ be the empirical distribution function and $F\left(x\right)$ be the theoretical distribution function of the general sample. Then the equality holds:

\[(\mathop(lim)_(n\to \infty ) (|F)_n\left(x\right)-F\left(x\right)|=0\ )\]

Examples of problems for finding the empirical distribution function

Example 1

Let the sample distribution have the following data, recorded using a table:

Picture 1.

Find the sample size, compose an empirical distribution function and plot it.

Sample size: $n=5+10+15+20=50$.

By property 5, we have that $F_n\left(x\right)=0$ for $x\le 1$, and $F_n\left(x\right)=1$ for $x>4$.

$x value

Thus, we get:

Figure 2.

Figure 3

Example 2

From the cities of the central part of Russia, 20 cities were randomly selected, for which the following data on fares in public transport were obtained: 14, 15, 12, 12, 13, 15, 15, 13, 15, 12, 15, 14, 15, 13 , 13, 12, 12, 15, 14, 14.

Compose an empirical distribution function of this sample and build its graph.

We write the sample values in ascending order and calculate the frequency of each value. We get the following table:

Figure 4

Sample size: $n=20$.

By property 5, we have that for $x\le 12$ $F_n\left(x\right)=0$, and for $x>15$ $F_n\left(x\right)=1$.

$x value

Thus, we get:

Figure 5

Let's plot the empirical distribution:

Figure 6

Originality: $92.12\%$.

Lecture 13

Let the statistical distribution of the frequencies of the quantitative trait X be known. Let us denote by the number of observations at which the value of the trait less than x was observed, and by n the total number of observations. Obviously, the relative frequency of the event X< x равна и является функцией x. Так как эта функция находится эмпирическим (опытным) путем, то ее называют эмпирической.

Empirical distribution function(sampling distribution function) is a function that determines for each value x the relative frequency of the event X< x. Таким образом, по определению ,где - число вариант, меньших x, n – объем выборки.

Unlike the empirical distribution function of the sample, the population distribution function is called theoretical distribution function. The difference between these functions is that the theoretical function defines probability events X< x, тогда как эмпирическая – relative frequency the same event.

As n grows, the relative frequency of the event X< x, т.е. стремится по вероятности к вероятности этого события. Иными словами

Properties of the empirical distribution function:

1) The values of the empirical function belong to the interval

2) - non-decreasing function

3) If - the smallest option, then = 0 at , if - the largest option, then =1 at .

The empirical distribution function of the sample serves to estimate the theoretical distribution function of the population.

Example. Let's build an empirical function according to the distribution of the sample:

Options
Frequencies

Let's find the sample size: 12+18+30=60. The smallest option is 2, so =0 for x £ 2. The value of x<6, т.е. , наблюдалось 12 раз, следовательно, =12/60=0,2 при 2< x £6. Аналогично, значения X < 10, т.е. и наблюдались 12+18=30 раз, поэтому =30/60 =0,5 при 6< x £10. Так как x=10 – наибольшая варианта, то =1 при x>10. Thus, the desired empirical function has the form:

The most important properties of statistical estimates

Let it be required to study some quantitative attribute of the general population. Let us assume that, from theoretical considerations, it was possible to establish that which one the distribution has an attribute and it is necessary to evaluate the parameters by which it is determined. For example, if the trait under study is normally distributed in the general population, then it is necessary to estimate the mathematical expectation and standard deviation; if the attribute has a Poisson distribution, then it is necessary to estimate the parameter l.

Usually, only sample data is available, such as trait values from n independent observations. Considering as independent random variables, we can say that to find a statistical estimate of an unknown parameter of a theoretical distribution means to find a function of the observed random variables, which gives an approximate value of the estimated parameter. For example, to estimate the mathematical expectation of a normal distribution, the role of a function is played by the arithmetic mean

In order for statistical estimates to give correct approximations of the estimated parameters, they must satisfy certain requirements, among which the most important are the requirements unbiasedness and solvency estimates.

Let be a statistical estimate of the unknown parameter of the theoretical distribution. Let the estimate be found based on a sample of size n. Let's repeat the experiment, i.e. we extract from the general population another sample of the same size and, based on its data, we obtain a different estimate of . Repeating the experiment many times, we get different numbers. The score can be thought of as a random variable and the numbers as its possible values.

If the estimate gives an approximation in abundance, i.e. each number is greater than the true value, then, as a consequence, the mathematical expectation (mean value) of the random variable is greater than:. Similarly, if it evaluates with a disadvantage, then .

Thus, the use of a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter, would lead to systematic (one sign) errors. If, on the contrary, , then this guarantees against systematic errors.

unbiased called a statistical estimate, the mathematical expectation of which is equal to the estimated parameter for any sample size.

Displaced is called an estimate that does not satisfy this condition.

The unbiasedness of the estimate does not yet guarantee a good approximation for the estimated parameter, since the possible values may be very scattered around its mean value, i.e. the variance can be significant. In this case, the estimate found from the data of one sample, for example, may turn out to be significantly remote from the average value , and hence from the estimated parameter itself.

efficient is called a statistical estimate which, for a given sample size n, has smallest possible variance .

When considering samples of a large volume, statistical estimates are required solvency .

Wealthy is called a statistical estimate, which, as n®¥, tends in probability to the estimated parameter. For example, if the variance of an unbiased estimator tends to zero as n®¥, then such an estimator also turns out to be consistent.

Determination of the empirical distribution function

Definition 1

Each value of $x_i$ ($i=1,2\ $, ... ,$ \ n$) is called a variant.

One of the estimates of the theoretical distribution function is the empirical distribution function.

Definition 3

The empirical distribution function $F_n(x)$ is the function that determines for each value $x$ the relative frequency of the event $X \

where $n_x$ is the number of options less than $x$, $n$ is the sample size.

The difference between an empirical function and a theoretical one is that the theoretical function determines the probability of the event $X

Properties of the empirical distribution function

Let us now consider several basic properties of the distribution function.

The range of the function $F_n\left(x\right)$ is the segment $$.

$F_n\left(x\right)$ is a non-decreasing function.

$F_n\left(x\right)$ is a left continuous function.

$F_n\left(x\right)$ is a piecewise constant function and increases only at points of values of the random variable $X$

Let $X_1$ be the smallest, and $X_n$ be the largest variant. Then $F_n\left(x\right)=0$ for $(x\le X)_1$ and $F_n\left(x\right)=1$ for $x\ge X_n$.

Let us introduce a theorem that connects the theoretical and empirical functions.

Theorem 1

Let $F_n\left(x\right)$ be the empirical distribution function and $F\left(x\right)$ be the theoretical distribution function of the general sample. Then the equality holds:

\[(\mathop(lim)_(n\to \infty ) (|F)_n\left(x\right)-F\left(x\right)|=0\ )\]

Examples of problems for finding the empirical distribution function

Example 1

Let the sample distribution have the following data, recorded using a table:

Picture 1.

Find the sample size, compose an empirical distribution function and plot it.

Sample size: $n=5+10+15+20=50$.

By property 5, we have that $F_n\left(x\right)=0$ for $x\le 1$, and $F_n\left(x\right)=1$ for $x>4$.

$x value

Thus, we get:

Figure 2.

Figure 3

Example 2

Compose an empirical distribution function of this sample and build its graph.

We write the sample values in ascending order and calculate the frequency of each value. We get the following table:

Figure 4

Sample size: $n=20$.

By property 5, we have that for $x\le 12$ $F_n\left(x\right)=0$, and for $x>15$ $F_n\left(x\right)=1$.

$x value

Thus, we get:

Figure 5

Let's plot the empirical distribution:

Figure 6

Originality: $92.12\%$.

Learn what an empirical formula is. In chemistry, an ESP is the simplest way to describe a compound—essentially, it is a list of the elements that make up the compound given their percentage. It should be noted that this simple formula does not describe order atoms in a compound, it simply indicates what elements it consists of. For example:

A compound consisting of 40.92% carbon; 4.58% hydrogen and 54.5% oxygen will have the empirical formula C 3 H 4 O 3 (an example of how to find the ESP of this compound will be discussed in the second part).

Learn the term "percentage composition"."Percent composition" refers to the percentage of each individual atom in the entire compound under consideration. To find the empirical formula of a compound, it is necessary to know the percentage composition of the compound. If you find an empirical formula as homework, then percentages are more likely to be given.

To find the percentage composition of a chemical compound in the laboratory, it is subjected to some physical experiments and then quantitative analysis. If you are not in the lab, you do not need to do these experiments.

Keep in mind that you will have to deal with gram atoms. A gram atom is a certain amount of a substance whose mass is equal to its atomic mass. To find a gram atom, you need to use the following equation: The percentage of an element in a compound is divided by the atomic mass of the element.

Let's say, for example, that we have a compound containing 40.92% carbon. The atomic mass of carbon is 12, so our equation would be 40.92 / 12 = 3.41.

Know how to find atomic ratio. When working with a compound, you will end up with more than one gram atom. After finding all the gram atoms of your compound, look at them. In order to find the atomic ratio, you will need to select the smallest gram-atom value that you have calculated. Then it will be necessary to divide all gram-atoms into the smallest gram-atom. For instance:

Suppose you are working with a compound containing three gram atoms: 1.5; 2 and 2.5. The smallest of these numbers is 1.5. Therefore, to find the ratio of atoms, you must divide all the numbers by 1.5 and put a ratio sign between them : .
1.5 / 1.5 = 1. 2 / 1.5 = 1.33. 2.5 / 1.5 = 1.66. Therefore, the ratio of atoms is 1: 1,33: 1,66 .

Learn how to convert atomic ratio values to integers. When writing an empirical formula, you must use whole numbers. This means that you cannot use numbers like 1.33. After you find the ratio of atoms, you need to convert fractional numbers (like 1.33) to integers (like 3). To do this, you need to find an integer, by multiplying each number of the atomic ratio by which you get integers. For instance:

Try 2. Multiply the atomic ratio numbers (1, 1.33, and 1.66) by 2. You get 2, 2.66, and 3.32. They are not integers, so 2 is not appropriate.
Try 3. If you multiply 1, 1.33, and 1.66 by 3, you get 3, 4, and 5, respectively. Therefore, the atomic ratio of integers has the form 3: 4: 5 .

Variation series. Polygon and histogram.

Distribution range- represents an ordered distribution of units of the studied population into groups according to a certain varying attribute.

Depending on the trait underlying the formation of a distribution series, there are attributive and variational distribution ranks:

§ Distribution series constructed in ascending or descending order of values of a quantitative attribute are called variational.

The variation series of the distribution consists of two columns:

The first column contains the quantitative values of the variable characteristic, which are called options and are marked. Discrete variant - expressed as an integer. The interval option is in the range from and to. Depending on the type of variants, it is possible to construct a discrete or interval variational series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers showing how many times in the aggregate the given value of the feature occurs, which denote . The sum of all frequencies should be equal to the number of units of the entire population.

Frequencies() are the frequencies expressed as a percentage of the total. The sum of all frequencies expressed as a percentage must be equal to 100% in fractions of one.

Graphical representation of distribution series

The distribution series are visualized using graphic images.

The distribution series are displayed as:

§ Polygon

§ Histograms

§ Cumulates

Polygon

When constructing a polygon, on the horizontal axis (abscissa) the values of the variable attribute are plotted, and on the vertical axis (ordinate) - frequencies or frequencies.

1. Polygon in fig. 6.1 was built according to the micro-census of the population of Russia in 1994.

bar graph

To construct a histogram along the abscissa, indicate the values of the boundaries of the intervals and, on their basis, construct rectangles whose height is proportional to the frequencies (or frequencies).

On fig. 6.2. the histogram of distribution of the population of Russia in 1997 by age groups is shown.

Fig.1. Distribution of the population of Russia by age groups

Empirical distribution function, properties.

An empirical distribution function (sample distribution function) is a function that determines for each value x the relative frequency of the event X

Unlike the empirical distribution function of the sample, the population distribution function is called the theoretical distribution function. The difference between these functions is that the theoretical function determines the probability of the event X

As n grows, the relative frequency of the event X

Basic properties

Let the elementary outcome be fixed. Then is the distribution function of the discrete distribution given by the following probability function:

where , a - the number of sample elements equal to . In particular, if all elements of the sample are distinct, then .

The mathematical expectation of this distribution is:

So the sample mean is the theoretical mean of the sample distribution.

Similarly, the sample variance is the theoretical variance of the sample distribution.

The random variable has a binomial distribution:

The sample distribution function is an unbiased estimate of the distribution function :

The variance of the sample distribution function has the form:

According to the strong law of large numbers, the sample distribution function converges almost surely to the theoretical distribution function:

almost certainly at .

The sample distribution function is an asymptotically normal estimate of the theoretical distribution function. If , then

By distribution at .

Read:

Aida libretto summary Pick up proverbs and wise sayings on the topic “Man-personality Proverbs man personality 6 Hobbies for girls: types and options Easy hobby An overview of the most unusual hobbies in the world - ideas for hobbies for extraordinary people How to change your life for the better - where to start, advice from psychologists How to change your life for the better