10-400-13 Lecture Notes - Lecture 7: Point Estimation, Statistical Inference, Longitudinal Study

28 views15 pages
Session 1: - A/B testing: test design A and test design B and compare which one is the
best (Obama’s campaign)
- Large number : use samples. Based on it, make a conclusion on the whole
- Population: the entire group of subjects (or units) we wish to study
- Sample (échantillon): a group of units drawn from the population
- Variable: a property we wish to study that belongs to every member of the population
- Parameter: unknown to us, what we search for, subject of interest (fixed value)
- Point estimate: an estimate (informed guess) of an unknown parameter obtained
from a particular sample (what’s evaluated based on a sample), variable from sample to
sample (sampling variability)
Parameter: the mean income of individuals in a
municipality
Point estimate: mean income of the sample
Possible value of the variable: >=0
- Statistical inference (survey/sample): drawing conclusions about a population from
sample data (clearly identify the population, determine how to reach them, make sure
that the sample is representative of the population, do not ignore a segment, do not
include units in the sample that do not belong to it) / Descriptive statistics
> During September 2013, an analysis of Instagram data revealed that since 40% of the
1000 most watched videos were brand videos, sharing online videos must be an
interesting vehicle for brand promotion: inference
- Experiments at TechLab: not totally true because we don’t walk on a treadmill in real
life
- Controlled experiment: TechLab, A/B testing, study on distractions during class,
clinical trials -> study the validity of a hypothesis regarding the effectiveness of a
treatment (build groups in a similar way is crucial to isolate the effect of the treatment)
-> allow to establish cause and effect relationships
- Treatment group: receives the treatment to be studied / Control group: group used as a
baseline measure (people who walk without texting)
- There is ethical considerations (we can’t force participants to text while crossing the
street)
- Observational study (passive observation): A study where the researcher observes
what happens to people under exposure (no treatment assigned, no control over the
experiment): achievable, simpler, inexpensive
- Longitudinal study: conducting several observations of the same individuals over a
period of time (draw conclusions regarding the evolution over time): time-consuming,
expensive but best
- Cross-sectional study: data collected at a single point in time (at 4.5 month, I will
evaluate). One of these is better depending on the study
> Omnibus studies are cross-sectional (method of research where data is collected
during the same interview)
- Even a well-made/ credible study can be wrong -> solid foundation (MMR and autism)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in
Session 2: Data Collection
To generalize inference, sample must be representative of the studied population (its
composition must be similar to that of the population and be sufficiently large (the more
you have, the better it is) + the feature studied must be measured
- Understand the methodolody of the experience (Literary Digest Poll) : No response can
create a bias, + not representative of the population, the study was not well designed.
Even if you have a huge sample size, if the study is not well designed, it’s gonna be
useless.
- Poll TVA: The sample (8100 voters) is not representative. Not the complete number of
the visitors of the website. People who responded may have political preferences. Make
sure that people who answered have same characteristics than those who didn’t when
it’s a volunteer type of poll (non-response bias)
- Bias: the systematic error between the parameter of interest and an estimate
> Selection bias: the method used to select the sample may create a sample that is not
representative (Miss a big fraction of people who have cellphones and not fixed lines,
those who want their number to be private and those who don’t have phones)
- To avoid it: identify the population correctly, make a list of units for the population that
will allow us to reach them if needed, use a probabilistic sampling method that uses
chance in order to select a sample that is likely to be representative of the population
> Measurement bias: (some could answer incorrectly to hide their ignorance) the
method used to measure the feature we wish to study does not fit what we really wish to
measure (How many times do you make love per month: Will get a nonresponse bias
because it’s too personal)
- To minimize measurement bias: think carefully about what you want to measure and
how to do so, give careful to the formulation of questions, make sure the process of
collecting and registering the data is reliable (if measurement tools are used, check that
they work properly)
> Nonresponse bias: If we cannot measure the feature of interest for some of the units of
the selected sample, a bias may occur if the individuals who answered are different from
those who didn’t. The non-response rate should always be reported (guarantee of
reliability)
- To avoid non -response: think how it could be prevented (summer period is less
favorable, phone, e-mail, income up to 20K, 40K to 60K, avoid often, occasionally,
questionnaire length, reach the individuals multiple times)
- Seek relevant information about the non-respondents. The more we have, the better
we will be able to evaluate how they are different from the respondents and predict
their responses
- Probabilistic sampling methods:
> Simple random sampling with replacement (SRSWR): select at random the units from
the population until we have the required size, possibility to select a person twice in a
sample
> Simple random sampling without replacement (SRSWOR): select at random the units
from the population until we have the required size, where units can only be chosen
once
> Systematic sampling: systematically select the units from a randomly sorted list (ex:
every 12th unit)
> Stratified sampling: the population is divided into separate groups called strata (e.g. by
age, sex, etc.). A simple random sample is then choose from each group (used by
StatCan)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in
> Cluster sampling (predetermined): if the population is already presented in groups of
units, named clusters (e.g. houses), we can take a simple random sample of clusters. All
individuals from the sampled clusters are part of the final sample. (method that reduces
costs and complexity): usually correlated, sharing similar characteristics
Avoid > Convenience samples: measure movie success at a theater exit (people who did
not like the movie moved earlier), your friends, (ask all students in class today to tell
their grades)
Avoid > Volunteer samples: surveys (TVA Poll)
- Measurement bias because question is not well phrased (racism investigations,
holocaust denial)
- Lack of memory lead to measurement bias (Diet of women with breast cancer in the
last three years: long to remember): longitudinal study
- Preparing questions: check that it is clear (no double negations), use impartial
vocabulary (that do not suggest an answer: do you plan to fulfill your obligations as a
citizen and vote), anonymous questions can help with sensitive topics, repeat
measurements at different points in time, test the questions before launching the study
- The quality of a study highly depends on its design (garbage in-> garbage out)
Session 3: Confidence intervals
- It is more informative to report an interval. Reporting a margin of error is equivalent to
reporting an interval of plausible values
- (fixed number we want to estimate parameter that describes the population
- X= variable to be studied / n= size of the sample / The size of a population is infinite,
unknown- Estimator is a rule for calculating an estimate of the value of an unknown
parameter in a population on the basis of observations X1, X2,Xn from a sample of size n
(random variable with its own distribution, mean and standard deviation, varies from
sample to sample).
- Since we are limited to a single sample-> loss of information
- Proportion of Quebecers who
voted for PQ: X1 collection of 0 and 1: would you vote for PQ? yes or no is yes 1 if no 0
- Excepted value of � barre is �:
- When you compute a proportion, you compute an average of 0’s and 1’s (sample
proportions are means)
- Evaluationg the risk of error means understanding the sample variability
- The more precise sample is the one with less variability because the small value and
the bigger value can change everything + the one with the more values
- The confidence interval : estimator (sample mean) +/- margin of error
- Ex: grades are more variable than an average / the � and �2 = population (fixed)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Session 1: - a/b testing: test design a and test design b and compare which one is the best (obama"s campaign) Based on it, make a conclusion on the whole. Population: the entire group of subjects (or units) we wish to study. Sample ( chantillon): a group of units drawn from the population. Variable: a property we wish to study that belongs to every member of the population. Parameter: unknown to us, what we search for, subject of interest (fixed value) Point estimate: an estimate (informed guess) of an unknown parameter obtained from a particular sample (what"s evaluated based on a sample), variable from sample to sample (sampling variability) Parameter: the mean income of individuals in a municipality. > during september 2013, an analysis of instagram data revealed that since 40% of the. 1000 most watched videos were brand videos, sharing online videos must be an interesting vehicle for brand promotion: inference.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents

Related Questions