С информацией, предоставленной @Glen_b, я смог найти ответ. Используя те же обозначения, что и вопрос
P(Zk≤x)=∑j=0k+1(k+1j)(−1)j(1−jx)k+,
где если a > 0 и 0 в противном случае. Я также даю ожидание и асимптотическую сходимость к распределению Гамбеля ( NB : не бета)a+=aa>00
E(Zk)=1k+1∑i=1k+11i∼log(k+1)k+1,P(Zk≤x)∼exp(−e−(k+1)x+log(k+1)).
Материал доказательств взят из нескольких публикаций, ссылки на которые приведены в ссылках. Они несколько длинные, но прямые.
1. Доказательство точного распределения
Пусть - IID равномерных случайных величин в интервале ( 0 , 1 ) . Упорядочив их, мы получаем статистику k порядка, обозначенную ( U ( 1 ) , … , U ( k ) ) . Равномерное расстояние определяется как Δ i = U ( i ) - U ( i - 1 ) , где U ((U1,…,Uk)(0,1)k(U(1),…,U(k))Δi=U(i)−U(i−1)иU(0)=0 . Упорядоченные интервалы - это соответствующие упорядоченные статистические данные Δ ( 1 ) ≤ … ≤ Δ ( k + 1 ) . Интересующая переменная Δ ( k + 1 ) .U(k+1)=1Δ(1)≤…≤Δ(k+1)Δ(k+1)
For fixed x∈(0,1), we define the indicator variable 1i=1{Δi>x}. By symmetry, the random vector (11,…,1k+1) is exchangeable, so the joint distribution of a subset of size j is the same as the joint distribution of the first j. By expanding the product, we thus obtain
P(Δ(k+1)≤x)=E(∏i=1k+1(1−1i))=1+∑j=1k+1(k+1j)(−1)jE(∏i=1j1i).
E(∏ji=11i)=(1−jx)k+, which will establish the distribution given above. We prove this for j=2, as the general case is proved similarly.
E(∏i=121i)=P(Δ1>x∩Δ2>x)=P(Δ1>x)P(Δ2>x|Δ1>x).
If Δ1>x, the k breakpoints are in the interval (x,1). Conditionally on this event, the breakpoints are still exchangeable, so the probability that the distance between the second and the first breakpoint is greater than x is the same as the probability that the distance between the first breakpoint and the left barrier (at position x) is greater than x. So
P(Δ2>x|Δ1>x)=P(all points are in (2x,1)∣∣all points are in (x,1)),soP(Δ2>x∩Δ1>x)=P(all points are in (2x,1))=(1−2x)k+.
2. Expectation
For distributions with finite support, we have
E(X)=∫P(X>x)dx=1−∫P(X≤x)dx.
Integrating the distribution of Δ(k+1), we obtain
E(Δ(k+1))=1k+1∑j=1k+1(k+1j)(−1)j+1j=1k+1∑j=1k+11j.
Hi=1+12+…+1i
Hk+1=∫101+x+…+xkdx=∫101−xk+11−xdx.
With the change of variable u=1−x and expanding the product, we obtain
Hk+1=∫10∑j=1k+1(k+1j)(−1)j+1uj−1du=∑j=1k+1(k+1j)(−1)j+1j.
3. Alternative construction of uniform spacings
In order to obtain the asymptotic distribution of the largest fragment, we will need to exhibit a classical construction of uniform spacings as exponential variables divided by their sum. The probability density of the associated order statistics (U(1),…,U(k)) is
fU(1),…U(k)(u(1),…,u(k))=k!,0≤u(1)≤…≤u(k+1).
If we denote the uniform spacings Δi=U(i)−U(i−1), with U(0)=0, we obtain
fΔ1,…Δk(δ1,…,δk)=k!,0≤δi+…+δk≤1.
By defining U(k+1)=1, we thus obtain
fΔ1,…Δk+1(δ1,…,δk+1)=k!,δ1+…+δk=1.
Now, let (X1,…,Xk+1) be IID exponential random variables with mean 1, and let S=X1+…+Xk+1. With a simple change of variable, we can see that
fX1,…Xk,S(x1,…,xk,s)=e−s.
Define Yi=Xi/S, such that by a change of variable we obtain
fY1,…Yk,S(y1,…,yk,s)=ske−s.
Integrating this density with respect to s, we thus obtain
fY1,…Yk,(y1,…,yk)=∫∞0ske−sds=k!,0≤yi+…+yk≤1,and thusfY1,…Yk+1,(y1,…,yk+1)=k!,y1+…+yk+1=1.
So the joint distribution of k+1 uniform spacings on the interval (0,1) is the same as the joint distribution of k+1 exponential random variables divided by their sum. We come to the following equivalence of distribution
Δ(k+1)≡X(k+1)X1+…+Xk+1.
4. Asymptotic distribution
Using the equivalence above, we obtain
P((k+1)Δ(k+1)−log(k+1)≤x)=P(X(k+1)≤(x+log(k+1))X1+…+Xk+1k+1)=P(X(k+1)−log(k+1)≤x+(x+log(k+1))Tk+1),
where Tk+1=X1+…+Xk+1k+1−1. This variable vanishes in probability because E(Tk+1)=0 and Var(log(k+1)Tk+1)=(log(k+1))2k+1↓0. Asymptotically, the distribution is the same as that of X(k+1)−log(k+1). Because the Xi are IID, we have
P(X(k+1)−log(k+1)≤x)=P(X1≤x+log(k+1))k+1=(1−e−x−log(k+1))k+1=(1−e−xk+1)k+1∼exp{−e−x}.
5. Graphical overview
The plot below shows the distribution of the largest fragment for different values of k. For k=10,20,50, I have also overlaid the asymptotic Gumbel distribution (thin line). The Gumbel is a very bad approximation for small values of k so I omit them to not overload the picture. The Gumbel approximation is good from k≈50.
6. References
The proofs above are taken from references 2 and 3. The cited literature contains many more results, such as the distribution of the ordered spacings of any rank, their limit distribution and some alternative constructions of the ordered uniform spacings. The key references are not easily accessible, so I also provide links to the full text.
- Bairamov et al. (2010) Limit results for ordered uniform spacings, Stat papers, 51:1, pp 227-240
- Holst (1980) On the lengths of the pieces of a stick broken at random, J. Appl. Prob., 17, pp 623-634
- Pyke (1965) Spacings, JRSS(B) 27:3, pp. 395-449
- Renyi (1953) On the theory of order statistics, Acta math Hung, 4, pp 191-231