10-400-13 Lecture Notes - Lecture 8: Central Limit Theorem, Standard Deviation, Cluster Sampling

29 views3 pages
90% 1.645
95% 1.960
99% 2.576
- Variable: a property we wish to study that belongs to every member of the population (possible value of the variable: 0): length
- Parameter: unknown to us, what we search for, subject of interest (fixed value): mean income of individuals in a school
- Point estimate: an estimate (informed guess) of an unknown parameter obtained from a particular sample
- Statistical inference (survey): drawing conclusions about a population from sample data
- Controlled experiment (dont walk on treadmill in real life): TechLab, A/B testing. Treatment group: receives the treatment to be
studied / Control group: group used as a baseline measure (people who walk without texting): ethical considerations
- Observational study (passive observation): A study where the researcher observes what happens to people under exposure (no
treatment assigned, no control over the experiment)
- Longitudinal: over a period of time/ Cross-sectional: at a single point in time (omnibus studies: data collected at same interview)
- Bias created by no response (people who did not respond do not have the same characteristics), not representative of the
population, the study not well designed
> Selection bias: the method used to select the sample may create a sample that is not representative (Miss a big fraction of
people who have cellphones and not fixed lines), collected only among 2 companies: not representative of all companies
If clients check a box to indicate that they do not wish to be contacted (these could be different from other clients)
> Measurement bias: (some could answer incorrectly to hide their ignorance) the method used does not fit what we really wish to
measure (How many times do you make love per month: formulation) Anonymous questions can help
> Nonresponse bias: think how it could be prevented (summer period is less favorable, income between 40K to 60K)
- Probabilistic sampling methods: > Simple random sampling with replacement (SRSWR): select at random the units from the
population until we have the required size, possibility to select a person twice in a sample
> Simple random sampling without replacement (SRSWOR): select at random the units from the population until we have the
required size, where units can only be chosen once
> Systematic sampling: systematically select the units from a randomly sorted list (ex: every 12th unit)
> Stratified sampling: the population is divided into separate groups called strata (e.g. by age, sex, etc.). A simple random sample is
then choose from each group: they plan to randomly sample 20 employees in each department and make them fill out a survey
> Cluster sampling (predetermined): if the population is already presented in groups of units, named clusters (e.g. houses), take a
simple random sample of clusters. All individuals are part of the final sample
- Avoid: > Convenience sample: measure movie success at a theater exit, friends > Volunteer samples: surveys (TVA poll) ->
Measurement bias because question is not well phrased
- Lack of memory lead to measurement bias (diet women who don’t remember events
of 3 years ago)
-
- - Excepted value of barre is :
- When you compute a proportion, you compute an average of 0’s and 1’s (sample
proportions are means)
- The confidence interval : estimator (sample mean) +/- margin of error
- : average estimation error -> the larger n, the less the sample mean varies from one
sample to the other, the less volatile the population, the more the sample mean gets
close
- : ): We estimate that the sample mean duration deviates on average by 3.22
minutes from the population mean duration, when the sample size is 10
-
- When the number of the distribution increased, the sample mean distribution gets closer to the normal distribution (defined
shape that fits). The Central Limit Theorem: if the sample size n is sufficiently n, the normal distribution describes the variability of
the sample mean through all possible simple random samples
- Pointed curve-> small dispersion-> small standard deviation/ Flat curve-> large dispersion-> large standard dev
- No matter the values of its parameters, 95% of observations from a normal variable are located at less than 1.96 standard
deviations from the mean value
- The interval at a 1- confidence level is :
- To no flaw the confidence level, .: TINV(, ).Valid if variable of interest X in the population has a normal distribution,
sample size is sufficiently large ≥30
Interpretation: At the 99% confidence level, the average is between x and y
- Increasing the level of confidence increases the margin of error. To increase precision, decrease the level of
confidence, have a bigger sample. Sample size/ standard deviation increases -> increased margin of error
(use it when you don’t have the data) NB.SI to calculate the proportion
- : maximized when = 0.5 (the highest, worst case scenario) /
- When having the confidence interval (8.56 – 8.920), add + divide by 2 to have the mean (8.74) / To calculate the standard
Null hypothesis Example:
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Variable: a property we wish to study that belongs to every member of the population (possible value of the variable: 0): length. Parameter: unknown to us, what we search for, subject of interest (fixed value): mean income of individuals in a school. Point estimate: an estimate (informed guess) of an unknown parameter obtained from a particular sample. Statistical inference (survey): drawing conclusions about a population from sample data. Controlled experiment (don"t walk on treadmill in real life): techlab, a/b testing. Treatment group: receives the treatment to be studied / control group: group used as a baseline measure (people who walk without texting): ethical considerations. Observational study (passive observation): a study where the researcher observes what happens to people under exposure (no treatment assigned, no control over the experiment) Longitudinal: over a period of time/ cross-sectional: at a single point in time (omnibus studies: data collected at same interview)

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers