STA 101 Chapter Notes - Chapter Unit 4: Statistical Hypothesis Testing, Percentile, Stellar Population

37 views6 pages
Unit 4: Inference for Numerical Variables
Part 1: (1) t-distribution
What purpose does a large sample size?
o As long as observations are independent, and the population distribution is not extremely
skewed, a large sample would ensure that
The sampling distribution of the mean is nearly normal
The estimate of the standard error is reliable s/√n
What if we have a small sample size?
o Okay, then how do we address the uncertainty associated with the standard error?
T-distribution!
T distribution
o When σ is unknown (almost always), use the t-distribution to address the uncertainty of
the standard error estimate
o Bell shaped but thicker tails than the normal
Observations are more likely to fall beyond 2 SDs from the mean
Confidence intervals constructed under the t-distribution will be wider
Extra thick tails helpful for mitigating the effect of a less reliable estimate for the
standard error of the sampling distribution
o Always centered at 0 (like the normal dist.)
o Has one parameter: degrees of freedom (df) determines the thickness
Compare to normal dist., which has two parameters SD and mean
What happens to the shape of the t-distribution as degrees of freedom increases?
As df increases, the shape of the t-distribution approaches the normal
distribution
T statistic is for inference of a mean when σ is unknown, which is almost always
o Calculated the same way
T = (obs. null)/SE
o P-value (same definition)
Find the following probabilities. Suppose you have a two-sided hypothesis test and your test
statistic is 2. Under which of these scenarios would you be able to reject the null hypothesis at the
5% significance level?
o P (|Z| > 2) = 0.0455 REJECT
o P (|tdf=50| > 20) = 0.0509 FAIL TO REJECT
o P (|tdf=10| > 20) = 0.0734 FAIL TO REJECT
o As the df decreases (t-distribution becomes more conservative), we become less likely to
reject the null hypothesis
Part 1: (2) Inference for a Mean
Ex: Playing a computer game during lunch affects memory for lunch, and later snack intake
o Researchers assessing relationship between distraction and recall of food consumer and
snacking
o Sample 44 patient (22 men and 22 women)
o Randomized into two groups 1) play solitaire while eating and 2) eat lunch without
distractions both groups were offered biscuits to snack on after lunch
Biscuit intake
x-bar
S
N
Solitaire
52.1 g
45.1 g
22
No distraction
27.1 g
26.4 g
22
Estimating the mean
o Confidence of the form Point estimate ± margin of error
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 6 pages and 3 million more documents.

Already have an account? Log in
o ± t*df(SE) = ± t*df(s/n) = ± t*n-1(s/n)
o Degrees of freedom for t-statistic for inference on one sample mean df = n 1
Estimating the mean for the above example
o ± t*df(SE) = (32.1, 72.1)
o We are 95% confident that distracted eaters consume between 32.1 to 72.1 grams of
snacks post-meal.
Ex: Suppose the suggested serving size of these biscuits is 30 g. Do these data provide convincing
evidence that the amount of snacks consumed by distracted eaters post-lunch is different from the
suggested serving size?
o 0.02 < p-value < 0.05
o We reject the null hypothesis (which agrees with the result of the confidence interval)
Conditions
o Independent observations
Random assignment
22 < 10% of all distracted eaters
o Sample size/skew
We don’t have a visualization of the population distribution, so we will look at the
sample distribution
There is a natural boundary at 0 grams
Data is right-skewed
Part 1: (3) Inference for comparing two independent means
DF for t statistic for inference on difference of two means df = min (n1 1, n2 1)
Refer to the distracted eaters study
Confidence interval for difference between independent means
o Correct interpretation We are 95% confident that those who eat with distractions
consume 1.83 g and 48.17 g more snacks than those who eat without distractions, on
average
o Incorrect We are 95% confident that the difference between the average snack
consumption of those who eat with and without distractions is between 1.83 g and 48.17 g.
Using hypothesis testing, the p-value is between 0.2 and 0.5
We reject the null hypothesis
Conditions for inference for comparing two independent means:
o Independence:
Within groups: sampled observations must be independent
Random assignment
If sampling without replacement, n < 10% of population
Between groups: the two groups must be independent of each other (non-paired)
o Sample size/skew: The more skew in the population distributions, the higher the sample
size needed
Part 1: (4) Inference for comparing two paired means
Analyzing paired data
o When two sets of observations have this special correspondence (not independent), they
are said to be paired
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 6 pages and 3 million more documents.

Already have an account? Log in