Virtual Laboratories > 5. Random Samples > 1 2 3 4 5 6 7 8 9 10 11 6. The Sample Variance II We continue our discussion of the sample variance from the last section, but now we assume that the variables are random. Thus, suppose that we have a basic random experiment, and that X is a real-valued random variable for the experiment with mean μ and standard deviation σ . We will need some higher order moments as well. Let σ3 = [(X − μ)3 ] and σ4 = [(X − μ)4 ] denote the 3rd and 4th moments about the mean. Recall that σ3 /σ 3 = skew(X), the skewness of X , and σ4 /σ 4 = kurt(X) , the kurtosis of X . We assume that σ4 < ∞. We repeat the basic experiment n times to form a new, compound experiment, with a sequence of independent random variables X = (X1 , X2 , … , Xn ), each with the same distribution as X . In statistical terms, X is a random sample of size n from the distribution of X . All of the statistics in the previous section make sense for X, of course, but now these statistics are random variables. We will use the notation established in that section, except for the usual convention of denoting random variables by capital letters. Finally, note that the deterministic properties and relations established in the last section still hold. In addition to being a measure of the center of the data X, the sample mean M= n 1 Xi n∑ i=1 is a natural estimator of the distribution mean μ. In this section, we will derive statistics that are natural estimators of the distribution variance σ 2 . The statistics that we will derive are different, depending on whether μ is known or unknown; for this reason, μ is referred to as a nuisance parameter for the problem of estimating σ 2 . A Special Sample Variance First we will assume that μ is known. Although this is almost always an artificial assumption, it is a nice place to start because the analysis is relatively easy and will give us insight for the standard case. A natural estimator of σ 2 is the following statistic, which we will refer to as the special sample variance. W2 = n 1 (Xi − μ)2 n∑ i=1 1. W 2 is the sample mean for a random sample of size n from the distribution of (X − μ)2 , and satisfies the following properties: a. (W 2 ) = σ 2 b. var(W 2 ) = 1 n (σ4 − σ 4 ) c. W 2 → σ 2 as n → ∞ with probability 1 2 4 d. The distribution of √n ‾ (W − σ 2 )/√σ‾‾‾‾‾‾ 4 − σ ‾ converges to the standard normal distribution as n → ∞. Proof: These result follow immediately from standard results in the section on the Law of Large Numbers and the section on the Central Limit Theorem. For part (b), note that var[(X − μ)2 ] = [(X − μ)4 ] − ([(X − μ)2 ]) = σ4 − σ 4 2 In particular part (a) means that W 2 is an unbiased estimator of σ 2 . From part (b), note that var(W 2 ) → 0 as n → ∞; this means that W 2 is a consistent estimator of σ 2 . The square root of the special sample variance is a special version of the sample standard deviation, denoted W . 2. (W) ≤ σ . Thus, W is a negativley biased estimator that tends to underestimate σ . Proof: This follows from Theorem 1(a) and Jensen's inequality. Since w ↦ √w ‾‾ is concave downward on [0, ∞), we have ‾‾‾2 ≤ √‾ ‾‾‾‾ (W) = (√‾ W ( W 2‾) = √‾‾ σ‾2 = σ . ) Next we compute the covariance and correlation between the sample mean and the special sample variance. 3. The covariance and correlation of M and W 2 are a. cov(M, W 2 ) = σ3 /n . 2 (σ − σ 4‾ ‾‾‾‾‾‾‾‾‾‾ b. cor(M, W 2 ) = σ 3 /√σ ) 4 Proof: From the bilinearity of the covariance operator and by independence, ⎡ ⎤ n n n 1 1 1 cov(M, W 2 ) = cov ⎢⎢ Xi , (Xj − μ)2 ⎥⎥ = 2 cov[Xi , (Xi − μ)2 ] ∑ ∑ n∑ n ⎣ n i=1 ⎦ j=1 i=1 But cov[Xi , (Xi − μ)2 ] = cov[Xi − μ, (Xi − μ)2 ] = [(Xi − μ)3 ] − (Xi − μ)[(Xi − μ)2 ] = σ3 . Substituting gives part (a). Part(b) follows from part (a), Theorem 1 (b), and our previous result that var(M) = σ 2 /n . Note that the correlation does not depend on the sample size, and that the sample mean and the special sample variance are uncorrelated if σ3 = 0 (equivalently skew(X) = 0). The Standard Sample Variance Consider now the more realistic case in which μ is unknown. In this case, a natural approach is to average, in some sense, the squared deviations (Xi − M )2 over i ∈ {1, 2, … , n}. It might seem that we should average by dividing by n . However, another approach is to divide by whatever constant would give us an unbiased estimator of σ 2 . This constant turns out to be n − 1, leading to the standard sample variance: S2 = n 1 (Xi − M )2 n−1 ∑ i=1 4. (S 2 ) = σ 2 . Proof: By expanding (as was shown in the last section), n ∑ i=1 (Xi − M )2 = n ∑ i=1 Xi2 − nM 2 Recall that (M) = μ and var(M) = σ 2 /n . Taking expected values in the displayed equation gives n (∑ i=1 (Xi − M )2 ) = n ∑ i=1 (σ 2 + μ2 ) − n σ2 σ2 + μ2 = n(σ 2 + μ2 ) − n + μ2 = (n − 1)σ 2 ( n ) ( n ) Of course, the square root of the sample variance is the sample standard deviation, denoted S . 5. (S) ≤ σ . Thus, S is a negativley biased estimator than tends to underestimate σ . Proof: The proof is exactly the same as in Theorem 2. 6. S 2 → σ 2 as n → ∞ with probability 1. Proof: This follows from the strong law of large numbers. Recall again that S2 = n 1 n n Xi2 − M2 = [M(X 2 ) − M 2 (X)] ∑ n − 1 i=1 n−1 n−1 But with probability 1, M(X 2 ) → σ 2 + μ2 as n → ∞ and M 2 (X) → μ2 as n → ∞. Since S 2 is an unbiased estimator of σ 2 , the variance of S 2 is the mean square error, a measure of the quality of the estimator. 7. var(S 2 ) = Proof: 1 n (σ4 − n−3 n−1 σ 4 ). Recall from the last section that S2 = n n 1 (X − Xj )2 ∑ i 2n(n − 1) ∑ i=1 j=1 Hence, using the bilinear property of covariance we have var(S 2 ) = cov(S 2 , S 2 ) = 1 n n n n ∑∑∑ 4n 2 (n − 1)2 ∑ i=1 j=1 k=1 k=1 cov[(Xi − Xj )2 , (Xk − Xl )2 ] We compute the covariances in this sum by considering disjoint cases: cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 0 if i = j or k = l, and there are 2n 3 − n 2 such terms. cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 0 if i, j, k, l are distinct, and there are n(n − 1)(n − 2)(n − 3) such terms. cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 2σ4 + 2σ 4 if i ≠ j and {k, l} = {i, j} , and there are 2n(n − 1) such terms. cov[(Xi − Xj )2 , (Xk − Xl )2 ] = σ4 − σ 4 if i ≠ j, k ≠ l and #({i, j} ∩ {k, l}) = 1 , and there are 4n(n − 1)(n − 2) such terms. Substituting gives the result. Note that var(S 2 ) → 0 as n → ∞, and hence S 2 is a consistent estimator of σ 2 . On the other hand, it's not surprising that the variance of the standard sample variance (where we assume that μ is unknown) is greater than the variance of the special standard variance (in which we assume μ is known). 8. var(S 2 ) > var(W 2 ). Proof: From Theorem 1, Theorem 7, and simple algebra, var(S 2 ) − var(W 2 ) = 2 σ4 n(n − 1) Note however that the difference goes to 0 as n → ∞. Next we compute the covariance between the sample mean and the sample variance. 9. The covariance and correlation between the sample mean and sample variance are a. cov(M, S 2 ) = σ3 /n b. cor(M, S 2 ) = σ3 σ√σ 4 −σ 4 (n−3)/(n−1) Proof: Recall again that M= n 1 Xi , n∑ i=1 S2 = n n 1 (X − Xk )2 ∑ j 2n(n − 1) ∑ j=1 k=1 Hence, using the bilinear property of covariance we have cov(M, S 2 ) = n n n 1 cov[Xi , (Xj − Xk )2 ] ∑∑ 2n 2 (n − 1) ∑ i=1 j=1 k=1 We compute the covariances in this sum by considering disjoint cases: cov[Xi , (Xj − Xk )2 ] = 0 if j = k , and there are n 2 such terms. cov[Xi , (Xj − Xk )2 ] = 0 if i, j, k are distinct, and there are n(n − 1)(n − 2) such terms. cov[Xi , (Xj − Xk )2 ] = σ3 if j ≠ k and i ∈ {j, k}, and there are 2n(n − 1) such terms. Substituting gives part (a). Part (b) follows from part(a), Theorem 7, and var(M) = σ 2 /n . In particular, note that cov(M, S 2 ) = cov(M, W 2 ) . Again, the sample mean and variance are uncorrelated if σ3 = 0 so that skew(X) = 0. Our last result gives the covariance and correlation between the special sample variance and the standard one. Curiously, the covariance the same as the variance of the special sample variance. 10. The covariance and correlation between W 2 and S 2 are a. cov(W 2 , S 2 ) = (σ4 − σ 4 )/n ‾‾‾‾‾‾‾‾‾‾‾‾ ‾ σ 4 −σ 4 b. cor(W 2 , S 2 ) = √ σ4 −σ 4 (n−3)/(n−1) Proof: Recall again that W2 = n 1 (Xi − μ)2 , n∑ i=1 S2 = n n 1 (X − Xk )2 ∑ j 2n(n − 1) ∑ j=1 k=1 so by the bilinear property of covariance we have cov(W 2 , S 2 ) = n n n 1 cov[(Xi − μ)2 , (Xj − Xk )2 ] 2 ∑∑ 2n (n − 1) ∑ i=1 j=1 k=1 Once again, we compute the covariances in this sum by considering disjoint cases: cov[(Xi − μ)2 , (Xj − Xk )2 ] = 0 if j = k , and there are n 2 such terms. cov[(Xi − μ)2 , (Xj − Xk )2 ] = 0 if i, j, k are distinct, and there are n(n − 1)(n − 2) such terms. cov[(Xi − μ)2 , (Xj − Xk )2 ] = σ4 − σ 4 if j ≠ k and i ∈ {j, k}, and there are 2n(n − 1) such terms. Substituting gives part (a). Part (b) follows from part (a) and Theorems 1 and 7 Note that cor(W 2 , S 2 ) → 1 as n → ∞, not surprising since with probability 1, S 2 → σ 2 and W 2 → σ 2 as n → ∞. Exercises Simulation Exercises Many of the applets in this project are simulations of experiments with a basic random variable of interest. When you run the simulation, you are performing independent replications of the experiment. In most cases, the applet displays the standard deviation of the distribution, both numerically in a table and graphically as the radius of the blue, horizontal bar in the graph box. When you run the simulation, the sample standard deviation is also displayed numerically in the table and graphically as the radius of the red horizontal bar in the graph box. 11. In the binomial coin experiment, the random variable is the number of heads. For various values of the parameters n (the number of coins) and p (the probability of heads), run the simulation 1000 times and note the apparent agreement between the sample standard deviation and the distribution standard deviation. 12. In the simulation of the matching experiment, the random variable is the number of matches. For selected values of n (the number of balls), run the simulation 1000 times and note the apparent agreement between the sample standard deviation and the distribution standard deviation. 13. Run the simulation of the gamma experiment 1000 times for various values of the rate parameter r and the shape parameter k . Note the apparent agreement between the sample standard deviation and the distribution standard deviation. Computational Exercises 14. Suppose that X has probability density function f (x) = 12 x 2 (1 − x) for 0 ≤ x ≤ 1. The distribution of X is a member of the beta family. Compute each of the following a. μ = (X) b. σ 2 = var(X) c. d3 = [(X − μ)3 ] d. d4 = [(X − μ)4 ] Answer: a. 3/5 b. 1/25 c. −2/875 \) d. 33/8750 15. Suppose now that (X1 , X2 , … , X10 ) is a random sample of size 10 from the beta distribution in the previous problem. Find each of the following: a. (M) b. var(M) c. (W 2 ) d. var(W 2 ) e. (S 2 ) f. var(S 2 ) g. cov(M, W 2 ) h. cov(M, S 2 ) i. cov(W 2 , S 2 ) Answer: a. 3/5 b. 1/250 c. 1/25 d. 19/87 500 e. 1/25 f. 199/787 500 g. −2/8750 h. −2/8750 i. 19/87 500 16. Suppose that X has probability density function f (x) = λe−λx for 0 ≤ x < ∞, where λ > 0 is a parameter. Thus X has the exponential distribution with rate parameter λ. Compute each of the following a. μ = (X) b. σ 2 = var(X) c. d3 = [(X − μ)3 ] d. d4 = [(X − μ)4 ] Answer: a. 1/λ b. 1/λ2 c. 2/λ3 d. 9/λ4 17. Suppose now that (X1 , X2 , … , X5 ) is a random sample of size 5 from the exponential distribution in the previous problem. Find each of the following: a. (M) b. var(M) c. (W 2 ) d. var(W 2 ) e. (S 2 ) f. var(S 2 ) g. cov(M, W 2 ) h. cov(M, S 2 ) i. cov(W 2 , S 2 ) Answer: a. 1/λ b. 1/5λ2 c. 1/λ2 d. 8/5λ4 e. 1/λ2 f. 17/10λ4 g. 2/5λ3 h. 2/5λ3 i. 8/5λ4 18. Recall that for an ace-six flat die, faces 1 and 6 have probability 1 4 each, while faces 2, 3, 4, and 5 have probability each. Let X denote the score when an ace-six flat die is thrown. Compute each of the following: 1 8 a. μ = (X) b. σ 2 = var(X) c. d3 = [(X − μ)3 ] d. d4 = [(X − μ)4 ] Answer: a. 7/2 b. 15/4 c. 0 d. 333/16 19. Suppose now that an ace-six flat die is tossed 8 times. Find each of the following: a. (M) b. var(M) c. (W 2 ) d. var(W 2 ) e. (S 2 ) f. var(S 2 ) g. cov(M, W 2 ) h. cov(M, S 2 ) i. cov(W 2 , S 2 ) Answer: a. 7/2 b. 15/32 c. 15/4 d. 27/32 e. 15/4 f. 207/512 g. 0 h. 0 i. 27/32 A particularly important special case occurs when the sampling distribution is normal. This case is explored in the section on Special Properties of Normal Samples.

© Copyright 2020