STATS 10 Study Guide - Midterm Guide: Bar Chart, Pareto Chart, Random Number Table

445 views18 pages
29 Apr 2018
School
Department
Course
MIDTERM STUDY GUIDE
Chapter 1: Introduction to Data
WHAT IS DATA?
● data
: collections of numbers, measurements, or any type of observation that someone
records (“the building blocks of statistics”)
Examples of Data Collection
:
- election polls
- surveys
- Google analytics (browser history)
- smartphone apps
- sales transactions
- hospital & school records
- sports
- Twitter / Facebook posts
- satellites
variable
: a characteristic, number, or quantity of a unit being observed that can be
measured or counted (a data item)
Types of Variables:
■ numerical
: the values of the variable are numbers (ex. weight, height,
temperature, GPA)
■ categorical
: categories or classifications (ex. eye color, year in school,
class subject)
indicator variables
: just indicate which observation we are looking at (ex.
full name, jersey number, student ID)
○ observation
: data from an individual study subject or sampled unit
POPULATIONS AND SAMPLES
○ population
: collection of observations of interest
very large → nearly impossible to obtain
measurements from
○ sample
: portion of the population of interest
usually taken to measure a characteristic
about a population
size of sample (usually denoted by n)
ORGANIZING AND REPORTING CATEGORICAL DATA
two-way table (a.k.a. frequency table)
: displays the counts of 2 categorical variables
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 18 pages and 3 million more documents.

Already have an account? Log in
TYPES OF STUDIES
observational study
: researchers do not assign choices, but rather simply observe
them (no treatment is applied to any individual or subject)
valuable for discovering trends and possible associations
NOT possible to demonstrate a causal relationship with an observational study
(cannot conclude causation)
controlled experiment
: researcher / experimenter deliberately manipulates the
treatment variable and assigns the subjects to those treatments (usually at random)
must be at least:
one treatment variable to manipulate
one outcome variable to measure
the outcome variable is observed & compared for the different groups of subjects
who have been treated differently
establishing causality
→ means to show that an outcome is affected by some
treatment
treatment group
: individuals who receive the treatment of interest in an
experiment
control group
: individuals who do NOT receive treatment
Association is NOT Causation
unless the individuals of the study are identical in every way, except for treatment → we
cannot conclude causation
(that the treatment caused the outcome)
if a certain type of outcome occurs more frequently in one group → we can
conclude that the treatment and outcome are associated
confounding variable
: characteristic other than the treatment that causes both
outcomes
Ex
. People with gray hair are observed to have more wrinkles. Does this mean
that gray hair causes wrinkles?
Grey hair is associated with wrinkles, but old age causes both gray hair and
wrinkles. → So, gray hair isn’t the cause of wrinkles
PRINCIPLES OF EXPERIMENTAL DESIGN
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 18 pages and 3 million more documents.

Already have an account? Log in
large sample size: ensures that the study captures the full range of variability amongst
the population (and allows small differences to be noticed)
controlled and randomized: random assignment of subjects to treatment or control
groups → to minimize bias
○ bias
= tendency to overestimate / underestimate a population parameter (due to
a measurement process) → Examples
:
polling only conservatives to estimate who will win an election
surveying people at the Wooden Center to estimate the average time a
student spends working out a week
researcher putting heaviest people in the same group for a research study
random assignment
helps minimize bias → Examples
:
use a computer or random number generator that randomly assigns the
people being studied into the control and treatment groups
randomly pull a number out of bag to assign individuals or subjects to
groups
double-blinding: neither subjects nor researchers know who is assigned to which group
blinding
- helps prevent bias from being introduced into a study
because participants do not know who is assigned to which study group
Who can influence the outcome of an experiment?
if the researcher knows a participant is in a certain group → they might
interact with them depending on group they are in
if the participant knows which treatment they are receiving → they might
behave differently than they would if they knew nothing about their
treatment
placebo (if appropriate): controls for possible differences between groups that occur
simply because subjects think their treatment is effective
○ placebo
: a “fake” treatment that looks just like the treatment being tested
sometimes, merely applying some form of treatment is enough to induce
an improvement in the study
one of the best methods to blind a subject
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 18 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Data : collections of numbers, measurements, or any type of observation that someone records ( the building blocks of statistics ) Examples of data collection : election polls surveys. Google analytics (browser history) smartphone apps sales transactions hospital & school records sports. Variable : a characteristic, number, or quantity of a unit being observed that can be measured or counted (a data item) Numerical : the values of the variable are numbers (ex. weight, height, Categorical : categories or classifications (ex. eye color, year in school, Indicator variables : just indicate which observation we are looking at (ex. class subject) full name, jersey number, student id) Observation : data from an individual study subject or sampled unit. Very large nearly impossible to obtain measurements from. Sample : portion of the population of interest. Usually taken to measure a characteristic about a population. Size of sample (usually denoted by n)