Задача об оценке параметров

Пусть и - четыре случайные переменные, такие что , где - неизвестные параметры. Также предположим, что ,Тогда какой из них является правдой? $Y_1,Y_2,Y_3$ $Y_4$ $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$ $\theta_1,\theta_2,\theta_3$ $Var(Y_i)=\sigma^2$ $i=1,2,3,4.$

A. являются оценочными. $\theta_1,\theta_2,\theta_3$

B. заслуживает внимания. $\theta_1+\theta_3$

C. является оценочным, а является наилучшей линейной несмещенной оценкой . $\theta_1-\theta_3$ $\dfrac{1}{2}(Y_1+Y_3)$ $\theta_1-\theta_3$

D. является оценочным. $\theta_2$

Ответ дается C, который выглядит странно для меня (потому что я получил D).

Почему я получил D? Так как . $E(Y_2-Y_4)=2\theta_2$

Почему я не понимаю, что С может быть ответом? Хорошо, я вижу, является непредвзятой оценкой , и ее дисперсия меньше, чем . $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ $\theta_1-\theta_3$ $\dfrac{Y_1+Y_3}{2}$

Пожалуйста, скажите мне, где я делаю не так.

Также размещено здесь: /math/2568894/a-problem-on-estimability-of-parameters

self-study estimation inference

— Stat_prob_001
источник

Вставьте self-studyтег или кто-нибудь придет и закроет ваш вопрос.

— Карл

@ Карл, это сделано, но почему?

— Stat_prob_001

Это правила для сайта, а не мои правила, правила сайта.

— Карл

Является ли

Y1≠Y3 $Y_1\neq Y_3$ ?

— Карл

@ Карл, вы можете думать так:

Y1=θ1−θ3+ϵ1 $Y_1=\theta_1-\theta_3+\epsilon_1$ где

ϵ1 $\epsilon_1$ - это rv со средним

0 $0$ и дисперсией

σ2 $\sigma^2$ . И,

Y3=θ1−θ3+ϵ3 $Y_3=\theta_1-\theta_3+\epsilon_3$ где

ϵ3 $\epsilon_3$ - это rv со средним

0 $0$ и дисперсией

σ2 $\sigma^2$

— Stat_prob_001

Ответы:

Этот ответ подчеркивает проверку оцениваемости. Свойство минимальной дисперсии мое вторичное рассмотрение.

Для начала обобщим информацию в виде матричной формы линейной модели следующим образом:

Y : = ⎡ ⎣ ⎢ ⎢ ⎢ Y 1 Y 2 Y 3 Y 4 ⎤ ⎦ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 - 1 - 1 - 1 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ θ 1 θ 2 θ 3 ⎤ ⎦ ⎥ + ⎡ ⎣ ⎢ ⎢ ⎢ ε 1 ε 2 ε 3 ε 4 ⎤ ⎦ ⎥ ⎥ ⎥ : = X β + ε, (1)

$\begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align}$ где

E(ε)=0,Var(ε)=σ2I $E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$ (обсудить оцениваемости, то spherity предположение не является необходимым. Но обсуждать свойство Гаусса-Маркова, мы должны взять на себя spherity из

ε $\varepsilon$ ).

Если конструкция матрицы имеет полный ранг, то параметр оригинал допускает уникальные наименьших квадратов оценки . Следовательно, любой параметр , определяется как линейная функция из является почтенная в том смысле , что она может быть однозначно оценена с помощью данных по методу наименьших квадратов оценки , как . $X$ $\beta$ $\hat{\beta} = (X'X)^{-1}X'Y$ $\phi$ $\phi(\beta)$ $\beta$ $\hat{\beta}$ $\hat{\phi} = p'\hat{\beta}$

Тонкость возникает, когда не имеет полного ранга. Для более подробного обсуждения мы сначала исправим некоторые обозначения и термины (я придерживаюсь принципа «Безкоординатный подход к линейным моделям» , раздел 4.8. Некоторые термины звучат излишне технически). Кроме того, обсуждение относится к общей линейной модели с и . $X$ $Y = X\beta + \varepsilon$ $X \in \mathbb{R}^{n \times k}$ $\beta \in \mathbb{R}^k$

Регрессионный коллектор представляет собой совокупность средних векторов , как ; изменяется в : $\beta$ $\mathbb{R}^k$ $M = {X β : β \in R k} .$ $M = \{X\beta: \beta \in \mathbb{R}^k\}.$

Параметрический функционал представляет собой линейный функционал , $\phi = \phi(\beta)$ $\beta$ $ϕ (β) = p' β = p 1 β 1 + \dots + p k β k .$ $\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$

Как упомянуто выше, когда , не каждый параметрический функционал оцениваем. Но, подождите, каково определение термина « технически оцениваем» ? Кажется, трудно дать четкое определение, не беспокоясь о небольшой линейной алгебре. Одно из определений, которое я считаю наиболее интуитивным, заключается в следующем (из той же вышеупомянутой ссылки): $\text{rank}(X) < k$ $\phi(\beta)$

Определение 1. Параметрический функционал оцениваем, если он однозначно определяется в том смысле, что всякий раз, когда удовлетворяют . $\phi(\beta)$ $X\beta$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1,\beta_2 \in \mathbb{R}^k$ $X\beta_1 = X\beta_2$

Интерпретация. Вышеприведенное определение предусматривает, что отображение из регрессионного многообразия в пространство параметров должно быть взаимно-однозначным, что гарантируется, когда (т. Е. Когда само взаимно однозначно). Когда , мы знаем, что существует такое, что $M$ $\phi$ $\text{rank}(X) = k$ $X$ $\text{rank}(X) < k$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$ , Вышеупомянутое оценочное определение фактически исключает те структурно-дефектные параметрические функционалы, которые приводят к самим разным значениям даже с одним и тем же значением на , что естественно не имеет смысла. С другой стороны, оцениваемый параметрический функционал допускает случай с , если выполняется условие . $M$ $\phi(\cdot)$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$

Существуют и другие эквивалентные условия для проверки оцениваемости параметрического функционала, приведенные в той же ссылке, Предложение 8.4.

После такого подробного вступления, давайте вернемся к вашему вопросу.

A. Само не оценивается по той причине, что , что влечет за собой при . Хотя приведенное выше определение дано для скалярных функционалов, оно легко обобщается на вектор-функционалы. $\beta$ $\text{rank}(X) < 3$ $X\beta_1 = X\beta_2$ $\beta_1 \neq \beta_2$

B. не оценивается. Для этого рассмотрим и , что дает но $\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ $\beta_1 = (0, 1, 0)'$ $\beta_2 = (1, 1, 1)'$ $X\beta_1 = X\beta_2$ $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$ .

C. $\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ is estimable. Because $X\beta_1 = X\beta_2$ trivially implies $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$ , i.e., $\phi_2(\beta_1) = \phi_2(\beta_2)$ .

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$ . Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$ , by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$ .

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is
$⎡ ⎣ ⎢ 40 - 4 020 - 4 04 ⎤ ⎦ ⎥ β^= ⎡ ⎣ ⎢ 10 - 1 11 - 1 10 - 1 1 - 1 - 1 ⎤ ⎦ ⎥ Y,$ $\begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation}$ which, after simplification, is $⎡ ⎣ ⎢ ⎢ ϕ (β^) θ^2 / 2 - ϕ (β^) ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ Y ¯ (Y 2 - Y 4) / 4 - Y ¯ ⎤ ⎦ ⎥,$ $\begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation}$ i.e., $\phi(\hat{\beta}) = \bar{Y}$ .

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$ , regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$ , it can be reformulated as

E [Y] = X β, β \in R k . (2)

$\begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}$

It is such a simple model that we only specified the first moment form of the response vector $Y$ . When $\text{rank}(X) = k$ , model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$ .).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$ , how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$ , by definition, this is $\phi_1 = \phi_2$ , hence model $(3)$ is identifiable when indexing with $\phi$ . Conversely, suppose model $(3)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$ , which is trivially $\phi_1(\beta) = \phi_2(\beta)$ .

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows

E [Y] = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ [ϕ 2 ϕ 3] = X ~ γ .

$\begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}$

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

— Zhanxiong
источник

If you need a proof for the second part of option C, I will supplement my answer.

— Zhanxiong

thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not

14(Y1+Y2+Y3+Y4) $\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"?

— Stat_prob_001

Oh, I don't know why I thought it is the estimator in C. Actually

(Y1+Y2+Y3+Y4)/4 $(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer

— Zhanxiong

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$ . We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form

t λ (Y) = \sum i = 1 4 λ i Y i

$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$ for constants

λ=(λi) $\lambda = (\lambda_i)$ .

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$ . By linearity of expectation,

θ 1 - θ 3 = E [t λ (Y)] = \sum i = 1 4 λ i E [Y i] = λ 1 (θ 1 - θ 3) + λ 2 (θ 1 + θ 2 - θ 3) + λ 3 (θ 1 - θ 3) + λ 4 (θ 1 - θ 2 - θ 3) = (λ 1 + λ 2 + λ 3 + λ 4) (θ 1 - θ 3) + (λ 2 - λ 4) θ 2 .

$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$

Comparing coefficients of the unknown quantities $\theta_i$ reveals

λ 2 - λ 4 = 0 and λ 1 + λ 2 + λ 3 + λ 4 = 1. (1)

$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

Var (t λ) = \sum i = 1 4 λ 2 i Var (Y i) + \sum i \neq j 4 λ i λ j Cov (Y i, Y j) .

$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

Var (t λ) = σ 2 (λ 21 + λ 22 + λ 23 + λ 24) . (2)

$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$ .

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$ . All we have to do is minimize $(2)$ , which can be written

σ 2 (λ 21 + λ 22 + λ 23 + λ 24) = σ 2 4 (2 u 2 + (2 v - 1) 2 + 1) .

$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$

No constraints apply to $(u,v)$ . Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$ , it is now obvious that the unique solution is

λ = (λ 1, λ 2, λ 3, λ 4) = (1 / 4, 1 / 4, 1 / 4, 1 / 4) .

$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

θ 2 = E [t (0, 1 / 2, 0, - 1 / 2) (Y)]

$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

— whuber
источник