MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Two-Sample Inference 1. Suppose: n1 = 45, n2 = 55, ¯ T1 = [204.4, 556.6], x ¯ T2 = [130.0, 355.0], x " 13825.3 23823.4 23823.4 73107.4 # " 8632.0 19616.7 19616.7 55964.5 # S1 = and S2 = . Then " Spooled = 10963.69 21505.42 21505.42 63661.31 # . So, T (x1 − x2 ) 1 1 + Spooled n1 n2 −1 (x1 − x2 ) " = ([204.4, 556.6] − [130.0, 355.0]) 1 1 + 45 55 " 10963.69 21505.42 21505.42 63661.31 ##−1 × ([204.4, 556.6] − [130.0, 355.0])T = 16.06622. But (n1 + n2 − 2)p 98 × 2 F2,97 (0.05) = 6.244089. Fp,n1 +n2 −p−1 (α) = n1 + n2 − p − 1 97 So, there is evidence against the hypothesis µ1 − µ2 = 0. The 95% simultaneous confidence intervals for the differences in the mean components are s (x11 − x21 ) ± (n1 + n2 − 2)p Fp,n1 +n2 −p−1 (α) n1 + n2 − p − 1 s ≡ −74.4 ± 6.244089 × 1 1 + 45 55 s 1 1 + S11,pooled n1 n2 s 1 1 + S22,pooled n1 n2 × 10963.69 and s (x12 − x22 ) ± (n1 + n2 − 2)p Fp,n1 +n2 −p−1 (α) n1 + n2 − p − 1 s ≡ 201.6 ± 6.244089 × 1 1 + 45 55 1 × 63661.31. 2. Consider the data in problem 1. We have T [x1 − x2 ] 1 1 S1 + S2 45 55 −1 [x1 − x2 ] = 15.65853. But χ2p (α) = χ22 (0.05) = 5.991. So, there is evidence against the hypothesis µ1 − µ2 = 0. 3. Municipal wastewater treatment plants are required by law to monitor their discharges into rivers and streams on a regular basis. Concern about the relability of data from one of these self-monitoring programs led to a stud in which samples of effluent were divided and sent to two laboratories for testing. One half of each sample was sent to the Wisconsin State Laboratory of Hygiene and one-half was sent to a private commerical laboratory routinely used in the monitoring program. Measurements of biochemical oxygen demand (BOD) and suspended solids (SS) were obtained, for n = 11 sample splits, from the two laboratories. The data are displayed below. Sample j 1 2 3 4 5 6 7 8 9 10 11 Commerical lab x11j (BOD) x12j (SS) 6 27 6 23 18 64 8 44 11 30 34 75 28 26 71 124 43 54 33 30 20 14 State lab (BOD) x22j (SS) 15 13 22 29 31 64 30 64 56 20 21 x21j 25 28 36 35 15 44 42 54 34 29 39 For this data, ¯ = [−9.363636, 13.272727], d " Sd = 199.2545 88.30909 88.30909 418.6182 # and " S−1 d = 0.005536320 −0.001167908 −0.001167908 0.002635186 # . So, T T 2 = nd S−1 d d = 13.63931. But (n − 1)p 20 Fp,n−p (α) = F2,9 (0.05) = 9.458877. n−p 9 Hence, there is no evidence to suggest that the two laboratories’ chemical analyses agree. 2 4. A 95% joint confidence region for the mean difference vector δ using the effluent data is T [−9.363636, 13.272727] − δ T " 0.06089952 −0.01284698 −0.01284698 0.02898705 # ([−9.363636, 13.272727] − δ) ≥ 1.283668 which can be rewritten as 11{0.06089952(−9.363636 − δ1 )2 + 0.02898705(13.272727 − δ2 )2 − 2 × 0.01284698(−9.363636 − δ1 )(13.272727 − δ2 )} ≤ 9.458877. 5. Fifty bars of soap are manufactured in each of two ways. Two characteristics X1 = lather and X2 = mildness are measured. The summary statistics for bars produced by methods 1 and 2 are " ¯1 = x " ¯2 = x 8.3 4.1 # 10.2 3.9 # , , " 2 1 1 6 # " 2 1 1 4 # S1 = and S2 = . Then " Spooled = 2 1 1 5 # . So, a 95% confidence region for µ1 − µ2 is (x1 − x2 ) " ⇔ 8.3 4.1 T # " − 1 1 Spooled + n1 n2 10.2 3.9 " " ⇔ −1.9 0.2 −δ # × 8.3 4.1 # !T " −δ !T " # " − 10.2 3.9 −1 (x1 − x2 ) ≤ 1 1 + 50 50 # " ! −δ ≤ 2 1 1 5 (n1 + n2 − 2)p Fp,n1 +n2 −p−1 (α) n1 + n2 − p − 1 ##−1 98 × 2 F2,97 (0.05) = 6.244089 97 13.888889 −2.777778 −2.777778 5.555556 # " −1.9 0.2 # ! −δ ≤ 6.244089, which can be rewritten as 13.888889(−1.9 − δ1 )2 + 5.555556(0.2 − δ2 )2 − 2 × 2.777778(−1.9 − δ1 )(0.2 − δ2 ) ≤ 6.244089. 3 6. A researcher considered three indices measuring severity of heart attacks. The values of these indices for n = 40 heart-attack patients arriving at a hospital emergency room produced the summary statistics: ¯ T = [46.1, 57.3, 50.4] x and 101.3 63.0 71.0 S = 63.0 80.2 55.6 . 71.0 55.6 97.4 All three indices are evaluated for each patient. For this data, (C¯ x)T = [−11.2, −4.3], " C= 1 −1 0 1 0 −1 " T CSC = # 55.5 22.9 22.9 56.7 , # , and T CSC −1 " = 0.021621086 −0.008732326 −0.008732326 0.021163497 # . So, T 2 = n (Cx)T CSCT −1 (Cx) = 90.49458. But (n − 1)(q − 1) Fq−1,n−q+1 (α) = F1,39 (0.05) = 4.091279. (n − q + 1) Hence, there is evidence against the equality of mean indices. Simultaneous 95% confidence intervals for the differences in pairs of mean indices are s (1, −1, 0)T x ± 4.091279 × (1, −1, 0)T S(1, −1, 0) , 40 s (1, 0, −1)T x ± 4.091279 × (1, 0, −1)T S(1, 0, −1) 40 and s (0, 1, −1)T x ± 4.091279 × (0, 1, −1)T S(0, 1, −1) . 40 4 7. Observations on two responses are collected for two treatments. The observation vectors (x1 , x2 )T are: " 3 1 2 3 6 3 # for treatment 1, and " # 2 5 3 2 3 1 1 3 for treatment 2. For this data, n1 = 3, n2 = 4, " ¯1 = x " ¯2 = x " S1 = 2 4 # 3 2 # , , 1 −1.5 −1.5 3 # and " 2 −1.33333 −1.33333 1.33333 S2 = # . Then " Spooled = 1 −1.399998 −1.399998 1.99998 # . So, (x1 − x2 ) T 1 1 + Spooled n1 n2 −1 (x1 − x2 ) = 17.14157. But (n1 + n2 − 2)p 10 Fp,n1 +n2 −p−1 (α) = F2,4 (0.01) = 45. n1 + n2 − p − 1 4 So, there is no evidence against the hypothesis H0 : µ1 − µ2 = 0. The 99% simultaneous confidence intervals for the differences µ1i − µ2i for i = 1, 2 are s (x11 − x21 ) ± s ≡ −1 ± 45 × (n1 + n2 − 2)p Fp,n1 +n2 −p−1 (α) n1 + n2 − p − 1 1 1 + 3 4 s 1 1 + S11,pooled n1 n2 s 1 1 + S22,pooled n1 n2 ×1 and s (x12 − x22 ) ± s ≡ 2± 45 × (n1 + n2 − 2)p Fp,n1 +n2 −p−1 (α) n1 + n2 − p − 1 1 1 + 3 4 × 1.99998. 5

© Copyright 2020