Откуда происходит в центральной предельной теореме (CLT)?


36

Очень простая версия центральной ограниченной теоремы, приведенная ниже есть CLT Линдеберга – Леви. Я не понимаю, почему на левой стороне руки есть . А Ляпуновский CLT говорит но почему не \ sqrt {s_n} ? Кто-нибудь скажет мне, что это за факторы, такие как \ sqrt {n} и \ frac {1} {s_n} ? как мы получаем их в теореме?n ( ( 1n n i=1Xi)-μ) d N(0,  σ 2 )

n((1ni=1nXi)μ) d N(0,σ2)
n 1s n n i=1(Xi-μi) dN(0,  1 )
1sni=1n(Xiμi) d N(0,1)
с нsnn 1с н1sn

3
Это объясняется на stats.stackexchange.com/questions/3734 . Этот ответ длинный, потому что он требует «интуиции». В нем делается вывод: «Это простое приближение, тем не менее, наводит на мысль, как де Моивр мог изначально предположить, что существует универсальное предельное распределение, что его логарифм является квадратичной функцией, и что надлежащий масштабный коэффициент должен быть пропорционален ... с нsnNn
whuber

1
Интуитивно понятно, что если все то и вторая строка следует из 1-й строки: делим на (конечно, условие Ляпунова, комбинация выключена all , это другой вопрос)σ i = σ σi=σs n = σ2i=nσsn=σ2i=nσn((1nni=1Xi)μ)=1nni=1(Xiμ)d N(0,σ2)
n((1ni=1nXi)μ)=1ni=1n(Xiμ)d N(0,σ2)
σ=snnσ=snn1nni=1(Xiμ)snn=1snni=1(Xiμi)d N(0,1)
1nni=1(Xiμ)snn=1sni=1n(Xiμi)d N(0,1)
σiσi
Sextus Empiricus

Ответы:


33

Хороший вопрос (+1) !!

Вы помните, что для независимых случайных величин и , и . Таким образом, дисперсия равна , а дисперсия ˉ X = 1XXYYVar(X+Y)=Var(X)+Var(Y)Var(X+Y)=Var(X)+Var(Y)Var(aX)=a2Var(X)Var(aX)=a2Var(X)ni=1Xini=1Xini=1σ2=nσ2ni=1σ2=nσ2n n i = 1 XiX¯=1nni=1Xiявляетсяnσ2/n2=σ2/nnσ2/n2=σ2/n.

Это для дисперсии . Чтобы стандартизировать случайную величину, вы делите ее на стандартное отклонение. Как известно, ожидаемое значение ˉ XX¯ равно μμ , поэтому переменная

ˉ X -E( ˉ X )V a r ( ˉ X ) =n ˉ X - μσ

X¯E(X¯)Var(X¯)=nX¯μσ
имеет ожидаемое значение 0 и дисперсию 1. Поэтому, если он стремится к гауссову, он должен быть стандартным гауссовскимN(0,1 )N(0,1) . Ваша формулировка в первом уравнении эквивалентна. Умножая левую часть на σ,σ вы устанавливаете дисперсию σ 2σ2 .

Что касается вашего второго пункта, я полагаю, что приведенное выше уравнение показывает, что вы должны делить на σ,σ а не на σσ стандартизировать уравнение, объясняя, почему вы используетеsnsn(оценщикσ),σ)а нес нsn .

Дополнение: @whuber предлагает обсудить причину масштабирования пn . Он делает этотам, но поскольку ответ очень длинный, я попытаюсь уловить суть его аргумента (который является реконструкцией мыслей де Мойвра).

Если вы добавите большое число nn из +1 и -1, вы можете приблизить вероятность того, что сумма будет j,j путем элементарного подсчета. Лог этой вероятности пропорционален - j 2 / nj2/n . Поэтому, если мы хотим, чтобы приведенная выше вероятность сходилась к константе, когда nn становится большим, мы должны использовать нормирующий множитель в O ( н )O(n).

Using modern (post de Moivre) mathematical tools, you can see the approximation mentioned above by noticing that the sought probability is

P(j)=(nn/2+j)2n=n!2n(n/2+j)!(n/2j)!

P(j)=(nn/2+j)2n=n!2n(n/2+j)!(n/2j)!

which we approximate by Stirling's formula

P(j)nnen/2+jen/2j2nen(n/2+j)n/2+j(n/2j)n/2j=(11+2j/n)n+j(112j/n)nj.

P(j)nnen/2+jen/2j2nen(n/2+j)n/2+j(n/2j)n/2j=(11+2j/n)n+j(112j/n)nj.

log(P(j))=(n+j)log(1+2j/n)(nj)log(12j/n)2j(n+j)/n+2j(nj)/nj2/n.

log(P(j))=(n+j)log(1+2j/n)(nj)log(12j/n)2j(n+j)/n+2j(nj)/nj2/n.

Please see my comments to previous answers by Michael C. and guy.
whuber

Seems like the first equation (LL CLT) s/b n((1nni=1Xi)μ) d N(0,1)n((1nni=1Xi)μ) d N(0,1)? That confused me as well that σ2σ2 appeared as the variance.
B_Miner

If you parametrize the Gaussian with mean and variance (not standard deviation) then I believe OP's formula is correct.
gui11aume

1
Ahh..Given that ˉXE(ˉX)Var(ˉX)=nˉXμσd N(0,1)X¯E(X¯)Var(X¯)=nX¯μσd N(0,1) if we multiply ˉXE(ˉX)Var(ˉX)X¯E(X¯)Var(X¯) by σσ we get what was shown by the OP (σσ cancel): namely n((1nni=1Xi)μ)n((1nni=1Xi)μ). But we know that VAR(aX) = a^2Var(X) where in this case a= σ2σ2 and Var(X) is 1 so the distribution is N(0,σ2)N(0,σ2).
B_Miner

Gui,If not too late I wanted to make sure I had this correct. If we assume ˉXE(ˉX)Var(ˉX)=n(ˉXμ)d N(0,1)X¯E(X¯)Var(X¯)=n(X¯μ)d N(0,1) and we multiply by a constant (σσ), the expected value of this quantity (i.e. n(ˉXμ)n(X¯μ)), which was zero is still zero as E[aX]=a*E[X] => σσ*0=0. Is this correct?
B_Miner

8

There is a nice theory of what kind of distributions can be limiting distributions of sums of random variables. The nice resource is the following book by Petrov, which I personally enjoyed immensely.

It turns out, that if you are investigating limits of this type 1anni=1Xnbn,(1)

1ani=1nXnbn,(1)
where XiXi are independent random variables, the distributions of limits are only certain distributions.

There is a lot of mathematics going around then, which boils to several theorems which completely characterizes what happens in the limit. One of such theorems is due to Feller:

Theorem Let {Xn;n=1,2,...}{Xn;n=1,2,...} be a sequence of independent random variables, Vn(x)Vn(x) be the distribution function of XnXn, and anan be a sequence of positive constant. In order that

max1knP(|Xk|εan)0, for every fixed ε>0

max1knP(|Xk|εan)0, for every fixed ε>0

and

supx|P(a1nnk=1Xk<x)Φ(x)|0

supxP(a1nk=1nXk<x)Φ(x)0

it is necessary and sufficient that

nk=1|x|εandVk(x)0 for every fixed ε>0,

k=1n|x|εandVk(x)0 for every fixed ε>0,

a2nnk=1(|x|<anx2dVk(x)(|x|<anxdVk(x))2)1

a2nk=1n(|x|<anx2dVk(x)(|x|<anxdVk(x))2)1

and

a1nnk=1|x|<anxdVk(x)0.

a1nk=1n|x|<anxdVk(x)0.

This theorem then gives you an idea of what anan should look like.

The general theory in the book is constructed in such way that norming constant is restricted in any way, but final theorems which give necessary and sufficient conditions, do not leave any room for norming constant other than nn.


4

snn represents the sample standard deviation for the sample mean. snn22 is the sample variance for the sample mean and it equals Snn22/n. Where Snn22 is the sample estimate of the population variance. Since snn =Snn/√n that explains how √n appears in the first formula. Note there would be a σ in the denominator if the limit were

N(0,1) but the limit is given as N(0, σ22). Since Snn is a consistent estimate of σ it is used in the secnd equation to taken σ out of the limit.


What about the other (more basic and important) part of the question: why snsn and not some other measure of dispersion?
whuber

@whuber That may be up for discussion but it was not part of the question. The OP just wanted to known why snn and √n appear in the formula for the CLT. Of course Snn is there because it is consistent for σ and in that form of the CLT σ is removed.
Michael R. Chernick

1
To me it's not at all clear that snsn is present because it is "consistent for σσ". Why wouldn't that also imply, say, that snsn should be used to normalize extreme-value statistics (which would not work)? Am I missing something simple and self-evident? And, to echo the OP, why not use snsn--after all, that is consistent for σσ!
whuber

The theorem as stated has convergence to N(0,1), so to accomplish that you either have to know σ and use it or use a consistent estimate of it which works by Slutsky's theorem I think. Was I that unclear?
Michael R. Chernick

I don't think you were unclear; I just think that an important point may be missing. After all, for many distributions we can obtain a limiting normal distribution by using the IQR instead of snsn--but then the result is not as neat (the SD of the limiting distribution depends on the distribution we begin with). I'm just suggesting that this deserves to be called out and explained. It will not be quite as obvious to someone who does not have the intuition developed by 40 years of standardizing all the distributions they encounter!
whuber

2

Intuitively, if ZnN(0,σ2)ZnN(0,σ2) for some σ2σ2 we should expect that Var(Zn)Var(Zn) is roughly equal to σ2σ2; it seems like a pretty reasonable expectation, though I don't think it is necessary in general. The reason for the nn in the first expression is that the variance of ˉXnμX¯nμ goes to 00 like 1n1n and so the nn is inflating the variance so that the expression just has variance equal to σ2σ2. In the second expression, the term snsn is defined to be ni=1Var(Xi)ni=1Var(Xi) while the variance of the numerator grows like ni=1Var(Xi)ni=1Var(Xi), so we again have that the variance of the whole expression is a constant (1 in this case).

Essentially, we know something "interesting" is happening with the distribution of ˉXn:=1niXi, but if we don't properly center and scale it we won't be able to see it. I've heard this described sometimes as needing to adjust the microscope. If we don't blow up (e.g.) ˉXμ by n then we just have ˉXnμ0 in distribution by the weak law; an interesting result in it's own right but not as informative as the CLT. If we inflate by any factor an which is dominated by n, we still get an(ˉXnμ)0 while any factor an which dominates n gives an(ˉXnμ). It turns out n is just the right magnification to be able to see what is going on in this case (note: all convergence here is in distribution; there is another level of magnification which is interesting for almost sure convergence, which gives rise to the law of iterated logarithm).


4
A more fundamental question, which ought to be addressed first, is why the SD is used to measure dispersion. Why not the absolute central kth moment for some other value of k? Or why not the IQR or any of its relatives? Once that is answered, then simple properties of covariance immediately give the n dependence (as @Gui11aume has recently explained.)
whuber

1
@whuber I agree, which is why I presented this as heuristic. I'm not certain it is amenable to a simple explanation, though I'd love to hear one. For me I'm not sure that I have a simpler, explainable reason past "because the square term is the relevant term in the Taylor expansion of the characteristic function once you subtract off the mean."
guy
Используя наш сайт, вы подтверждаете, что прочитали и поняли нашу Политику в отношении файлов cookie и Политику конфиденциальности.
Licensed under cc by-sa 3.0 with attribution required.