STATS 10 Study Guide - Midterm Guide: Bar Chart, Pareto Chart, Random Number Table
![](https://new-preview-html.oneclass.com/dgvaz5r0qPOlmewVDOJXNReGJLbVnA4K/bg1.png)
MIDTERM STUDY GUIDE
Chapter 1: Introduction to Data
WHAT IS DATA?
● data
: collections of numbers, measurements, or any type of observation that someone
records (“the building blocks of statistics”)
○ Examples of Data Collection
:
- election polls
- surveys
- Google analytics (browser history)
- smartphone apps
- sales transactions
- hospital & school records
- sports
- Twitter / Facebook posts
- satellites
●variable
: a characteristic, number, or quantity of a unit being observed that can be
measured or counted (a data item)
○ Types of Variables:
■ numerical
: the values of the variable are numbers (ex. weight, height,
temperature, GPA)
■ categorical
: categories or classifications (ex. eye color, year in school,
class subject)
■indicator variables
: just indicate which observation we are looking at (ex.
full name, jersey number, student ID)
○ observation
: data from an individual study subject or sampled unit
POPULATIONS AND SAMPLES
○ population
: collection of observations of interest
■ very large → nearly impossible to obtain
measurements from
○ sample
: portion of the population of interest
■ usually taken to measure a characteristic
about a population
■ size of sample (usually denoted by n)
ORGANIZING AND REPORTING CATEGORICAL DATA
● two-way table (a.k.a. frequency table)
: displays the counts of 2 categorical variables
![](https://new-preview-html.oneclass.com/dgvaz5r0qPOlmewVDOJXNReGJLbVnA4K/bg2.png)
TYPES OF STUDIES
● observational study
: researchers do not assign choices, but rather simply observe
them (no treatment is applied to any individual or subject)
○ valuable for discovering trends and possible associations
○ NOT possible to demonstrate a causal relationship with an observational study
(cannot conclude causation)
● controlled experiment
: researcher / experimenter deliberately manipulates the
treatment variable and assigns the subjects to those treatments (usually at random)
○ must be at least:
■ one treatment variable to manipulate
■ one outcome variable to measure
○ the outcome variable is observed & compared for the different groups of subjects
who have been treated differently
○ establishing causality
→ means to show that an outcome is affected by some
treatment
■ treatment group
: individuals who receive the treatment of interest in an
experiment
■ control group
: individuals who do NOT receive treatment
Association is NOT Causation
● unless the individuals of the study are identical in every way, except for treatment → we
cannot conclude causation
(that the treatment caused the outcome)
○ if a certain type of outcome occurs more frequently in one group → we can
conclude that the treatment and outcome are associated
● confounding variable
: characteristic other than the treatment that causes both
outcomes
○Ex
. People with gray hair are observed to have more wrinkles. Does this mean
that gray hair causes wrinkles?
○ Grey hair is associated with wrinkles, but old age causes both gray hair and
wrinkles. → So, gray hair isn’t the cause of wrinkles
PRINCIPLES OF EXPERIMENTAL DESIGN
![](https://new-preview-html.oneclass.com/dgvaz5r0qPOlmewVDOJXNReGJLbVnA4K/bg3.png)
●large sample size: ensures that the study captures the full range of variability amongst
the population (and allows small differences to be noticed)
●controlled and randomized: random assignment of subjects to treatment or control
groups → to minimize bias
○ bias
= tendency to overestimate / underestimate a population parameter (due to
a measurement process) → Examples
:
■ polling only conservatives to estimate who will win an election
■ surveying people at the Wooden Center to estimate the average time a
student spends working out a week
■ researcher putting heaviest people in the same group for a research study
○ random assignment
helps minimize bias → Examples
:
■ use a computer or random number generator that randomly assigns the
people being studied into the control and treatment groups
■ randomly pull a number out of bag to assign individuals or subjects to
groups
●double-blinding: neither subjects nor researchers know who is assigned to which group
○blinding
- helps prevent bias from being introduced into a study
■because participants do not know who is assigned to which study group
○ Who can influence the outcome of an experiment?
■ if the researcher knows a participant is in a certain group → they might
interact with them depending on group they are in
■ if the participant knows which treatment they are receiving → they might
behave differently than they would if they knew nothing about their
treatment
●placebo (if appropriate): controls for possible differences between groups that occur
simply because subjects think their treatment is effective
○ placebo
: a “fake” treatment that looks just like the treatment being tested
■ sometimes, merely applying some form of treatment is enough to induce
an improvement in the study
■ one of the best methods to blind a subject
Document Summary
Data : collections of numbers, measurements, or any type of observation that someone records ( the building blocks of statistics ) Examples of data collection : election polls surveys. Google analytics (browser history) smartphone apps sales transactions hospital & school records sports. Variable : a characteristic, number, or quantity of a unit being observed that can be measured or counted (a data item) Numerical : the values of the variable are numbers (ex. weight, height, Categorical : categories or classifications (ex. eye color, year in school, Indicator variables : just indicate which observation we are looking at (ex. class subject) full name, jersey number, student id) Observation : data from an individual study subject or sampled unit. Very large nearly impossible to obtain measurements from. Sample : portion of the population of interest. Usually taken to measure a characteristic about a population. Size of sample (usually denoted by n)