STA 101 Chapter Notes - Chapter Unit 1: Usa Today, Unimodality, Box Plot

43 views8 pages
Unit 1: Introduction to Data Analysis
Video Notes
Introduction
Anecdotal evidence evidence that is based on an unrepresentative sampling
What is the population of interest? What is the sample?
Part 1: (1) Data Basics
Data Matrix
Country
Category 1
Category 2
Category 3
Argentina
50%
89
low
Each column represents a variable, while each row represents an observation (case).
Type of variables:
Numerical (quantitative): take on numerical values; can be used for arithmetic operations
o Continuous Numerical variables: take on any infinite number of values within a given range
Ex: height, percentages
o Discrete Numerical variables: take on one of a specific set of numeric values
Ex: Count data
Categorical (qualitative): take on limited numbers of distinct categories; not sensible to use in
arithmetic operations but can be identified with numbers
o Regular categorical
o Ordinal: levels have inherent ordering
How satisfied are you? very unsatisfied, unsatisfied, satisfied, very satisfied
Relationships between variables
Associated/dependent variables: Two variables that show some connection with one another
o Association can be positive or negative
Independent variables: Two variables that are not associated
Part 1: (2) Observational Studies and Experiments
Types of studies:
Observational: collect data in a way that does not directly interfere with how the data arise
o Only establish an association
o Retrospective: uses past data
o Prospective: data are collected throughout the study
Experiment: randomly assign subjects to treatments
o Establish causal connections
o Random assignment reduces the influence of random variables
Example: USAToday article claimed Eating Breakfast Cereal Keeps Girls Slim, referencing an
observational study (a survey)
o Three possible explanations:
Eating breakfast causes girls to be slimmer
Being slim causes girls to eat breakfast
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in
A third variable is responsible for the girls being slim and for girls eating breakfast
o Aforementioned third variable = confounding variable
Confounding variable: extraneous variables that affect both the explanatory and the
response variable and that make it seem like there is a relationship between them
CONCLUSION: The type of study (observational or experimental) determines whether we can infer
causation or correlation from the conclusion.
Part 1: (3) Sampling and Sources of Bias
Types of sampling:
Census: the entire population
o Requires too much resources
o Some individuals may be hard to locate/measure (people who are different from the rest of
the population)
US Census may bypass illegal immigrants
o Populations rarely stand still death and birth
Exploratory analysis taking a sample that is representative of the entire population
o Generates an inference
o Inference is valid only if the sample is representative
Sampling bias:
o Convenience sample: individuals who are easily accessible are more likely to be included
in the sample
o Non-response: if only a (non-random) fraction of the randomly sampled people respond to
a survey such that the sample is no longer representative of the population
o Voluntary response: occurs when the sample consists of people who volunteer to respond
because they have strong opinions on the issue
o Non-response = initial sample is random, but the people who choose to answer are not
random; voluntary = initial sample is not random
Sampling Methods
Simple random sample (SRS): each case is equally likely to be selected
Stratified sample: divide the population into homogenous strata, then randomly sample from
within each stratum
Cluster sample: divide the population clusters, randomly sample a few clusters and then
randomly sample from within these clusters
Part 1: (4) Experimental Design
Principles of experimental design
(1) Control: compare treatment of interest to a control group
(2) Randomize: randomly assign subjects to treatments
(3) Replicate: collect a sufficiently large sample, or replicate the entire study
(4) Block: block for variables known or suspected to affect the outcome
Blocking vs. explanatory variables
o Explanatory variables (factors): conditions we can impose on experimental units
o Blocking variables: characteristics that the experimental units come with, that we would
like to control for
o Blocking is similar to stratifying (except is used in experimental settings)
Blocking = random assignment
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in
Stratifying = random sampling
Experimental Terminology
Placebo: fake treatment, often used as the control group for medical studies
o Placebo effect: experimental units show change despite being on placebo
Blinding: experimental units don’t know which group they’re in
Double-blinding: both experimental units and researchers don’t know the group assignment
Part 1: (Spotlight) Random Sample Assignment
Random sampling: subjects are being selected for a study
o Renders the sample to be representative of the population = generalizable to the
population
Random assignment: subjects are randomly assigned treatments in an experiment
o Allows causality
o Reduces influence of confounding variables
Example
o Random sampling gathering a random sample
o Random assignment randomly assigning treatments of the sample
Part 2: (1) Visualizing Numerical Data
Scatterplots
Common for visualizing the relationship between two numerical variables
Which of the two is the explanatory variable?
o Which of the two is the one affecting the other?
Explanatory variable x-axis
Response variable y-axis
Evaluating relationships on scatterplots
o Direction positive or negative?
o Shape linear or curved?
o Strength strong or weak?
o Outliers?
Histogram
Looking at one numerical variable individually
Provides a view of data density
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Anecdotal evidence evidence that is based on an unrepresentative sampling. Each column represents a variable, while each row represents an observation (case). Associated/dependent variables: two variables that show some connection with one another: association can be positive or negative. Independent variables: two variables that are not associated. Observational: collect data in a way that does not directly interfere with how the data arise: only establish an association, retrospective: uses past data, prospective: data are collected throughout the study. Experiment: randomly assign subjects to treatments: establish causal connections, random assignment reduces the influence of random variables. Conclusion: the type of study (observational or experimental) determines whether we can infer causation or correlation from the conclusion. Part 1: (3) sampling and sources of bias. Exploratory analysis taking a sample that is representative of the entire population: generates an inference, inference is valid only if the sample is representative. Simple random sample (srs): each case is equally likely to be selected.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents