STAT 100 Study Guide - Midterm Guide: Smoothness
Document Summary
Self-selected s: subset is whoever chooses to answer. Convenience s: subset is whomever is convenient for researcher. Judgment s: subset is whomever researcher deliberately selects. Cluster s: divide population into clusters -> use srs to choose clusters instead of indv (ex: everyone in a town) -> compared to srs: risk of bias if members of clusters are similar, cheaper. Stratified s: divide population into clusters then create 1 srs per strata (female, male, other) -> compared to srs: requires being able to break into reasonable strata, can help ensure representation. Administrative dataset: a dataset collected as part of administrative work (e. g. social security names, restaurant safety ratings, etc. Quantitative: # w/meaning categorical(nominal:categories w/no specific ordering male/female/ other || ordinal: cat w/order but no magnitude/intervals level of edu) Quantitative data [histograms, box plots, rug plots, smoothed interpolations (kde kernel density estimators) look for spread, shape, modes, outliers, unreasonable values ]