Statistical Inference • drawing conclusions about a population, based on a sample. • uses properties of the sampling distribution and random sampling. Example: Population: GRE results for a new exam format on the quantitative section Sample: n=300 test scores shape Population normal? Sampling Dist. for X approx. normal mean μ (unknown) μ SD σ =100 (assume known) σX = σ 100 = = 5.8 n 300 ~95% of the Sampling Distribution is within ± 2iσ X of μ. 1) In ~95% of the samples of n=300, X is within +/- 11.6 pts of μ. 2) In ~95% of the samples of n=300, μ is within +/- 11.6 pts of X . 3) In ~95% of the samples of n=300, μ lies between X − 11.6 and X + 11.6 . 4) We are ~95% confident that we have one of the samples that gives an interval containing μ. Hypothesis Testing • Null Hypothesis (Ho ): Ho: µ=54 cm The population mean is 54 cm. Ha: µ>54 (µa = 58) σX = 4.5 cm = 1.5 cm 9 α = .05 • Alternative Hypothesis (HA ): The population mean is greater than 54 cm. p-value • Measures the strength of the sample evidence against Ho 4 Steps for finding the Power in a test of hypotheses 1) Write the RR for Ho in terms of z-scores: zs ≥ 1.645 2) Write the RR for Ho in terms of X : X − 54 ≥ 1.645 → X ≥ 56.47 1.5 • A small p-value gives strong evidence against Ho • Definition: The probability, computed assuming that Ho is true, of a sample result ( X ) as extreme or more extreme than the one from our sample. Rule of Thumb for the significance of p-values 3) Find the probability of a Type II error if µ=58 4) Power = 1 – P(Type II Error) : • If the p-value is less than .05, then our results are statistically significant at the .05 level β ( µa ) = P(Z < zα − | µ0 − µa | / σ x ) β(58) = P(Accept Ho | Ha is true [µ=58] ) PWR(58) = 1 – β(58) Ho: µ = 40 mpg HA: µ < 40 Population Standard Deviation: σ = 6 mpg Study Design The number of observations needed to detect a true difference ∆ = µA - µo at the α level of significance with power=1-β is Significance Level: α = .01 Sample Results: A SRS of n = 16 gives X = 36.7 • 1-sided alternative: n= 2 σ2 z + zβ ) 2 ( α ∆ • 2-sided alternative: n= 2 σ2 z + zβ ) 2 ( α /2 ∆ 1) Write the rejection rule (RR) for Ho in terms of z-scores. 2) Write the rejection rule (RR) for Ho in terms of X . 3) Find the probability of a Type II error if µ=38 [i.e., β (38)] a) Find the sample z-score (zs ). b) State a conclusion for the test at the α = .01 level. c) Find the p-value. Ex: Find the sample size needed to detect a 2 mpg difference at the α=.01 level with 80% power. Ho: µ = 40 mpg HA: µ < 40 Population Standard Deviation: σ = 6 mpg • So far we have assumed that σ was known for the population 100(1-α)% Confidence Interval for µ ⎛ σ ⎞ * CLT Æ Sampling Distribution for X is approximately N ⎜ µ , ⎟ n⎠ ⎝ X −µ * zs = has a N(0,1) distribution σ/ n • In reality, we will rarely (if ever) know σ * we estimate σ with s = ∑ (x − x ) where tα / 2 is the upper tail critical value with n-1 d.f. Ex: A study on cholesterol levels in adult males eating fast food >3 times/week A SRS of n = 30 gives X = 180.52 mg/dL s = 41.23 mg/dL i n −1 1. Find a 95% CI for µ and state an interpretation of the interval * Standard Deviation (of the mean) vs. Standard Error (of the mean) σ n s n 2 X −µ has a Student’s T distribution with n-1 degrees of freedom * ts = s/ n SDX = x ± tα / 2 SE X = Properties of the T-distribution • Symmetric & bell shaped with mean = 0 • Larger spread than the N(0,1) distribution • As the d.f. increase, the T-distribution approaches the standard normal curve • Table 3/C gives upper tail probabilities • Developed by William S. Gosset http://www.uvm.edu/~rsingle/stats/Gosset.html 2. Find t* such that there is 10% area to the left Normal Distribution in blue T Distribution df = 5 in green T Distribution df = 10 in red Ho: µ=170 vs. Ha: µ≠170 at the α=.05 l.o.s. Suppose we want to find a p-value for the following set of hypotheses Ho: µ=170 mg/dL Ha: µ>170 s n Ex: (d.f. = 6) 1. Find t* such that there is 2% area to the right 2. Based on this CI, test … ts = H o : µ = µ0 X − µ0 s n Rejection Region Assumptions, Robustness, and Conditions for Valid CIs & T-Tests p-value 1) H a : µ > µ0 ts ≥ tα p − value = P(T ≥ ts ) 2) H a : µ < µ0 ts ≤ −tα p − value = P(T ≤ ts ) 3) H a : µ ≠ µ0 | ts |≥ tα / 2 p − value = 2 ⋅ P(T ≥| ts |) Ho: µ=165 mg/dL • In reality, populations may be anywhere from slightly non-normal to very non-normal. Robustness of the T-procedures • The T-test and CI are called robust to the assumption of normality because p-values and confidence levels are not greatly affected by violations of this assumption of normally distributed populations, especially if sample sizes are large enough. Conditions for a Valid 1-Sample CI and T-Test Ha: µ>165 Sample Results: • T-tests and CIs are based on the assumption that the population values being studied have a Normal distribution. A SRS of n = 30 gives X = 180.52 s = 41.23 a) Find the sample t-score (ts ). b) Bracket the p-value. c) State a conclusion for the test at the α = .05 level. 1) Write the rejection rule (RR) for Ho in terms of t-scores. 2) Write the rejection rule (RR) for Ho in terms of X . • The data are a SRS from the population. • The population must be large enough (at least 10 times larger than the sample size). • Conditions for the sample: o If n is small (n < 15), the data should not be grossly non-normal or contain outliers. o If n is “medium” (15 ≤ n < 40), the data should not have strong skewness or outliers. o If n is large (n ≥ 40), the T-procedures are robust to non-normality. Checking if the conditions are met in your sample • Always make a plot of the data to check for skewness and outliers before relying on T-procedures in small samples.

© Copyright 2020