CAS MA 113 Lecture Notes - Lecture 1: Cumulative Frequency Analysis, Continuous Or Discrete Variable, Statistical Inference

68 views11 pages

raspberryserval680

30 Apr 2018

School

Boston University

Department

Mathematics & Statistics

Course

CAS MA 113

Professor

Dan Weiner

For unlimited access to Class Notes, a Class+ subscription is required.

Chapter 1:

General Definitions

•Statistics - the science of collecting, organizing, summarizing, analyzing information to draw

conclusions/answer questions

•also about providing a measure of confidence in any conclusion.

•ex: given a set of data, how can we use digital & numeric ways to analyze this while making bigger

inferences about this

•Data - a “fact or proposition used to draw a conclusion or make a decision’

•also can describe the characteristics of an individual

•‘information’ = data

•Margin of Error - measure of confidence

•*a good sample should have a margin of error

•Population - the entire group of individuals who are being studied

•Sample - a subset of the population that is being studied

•statistics usually uses a population sample to make an inference about the general population

•Individual - a person or object who is a member of the population being studied

•Descriptive Statistics - the process of organizing and summarizing data

•describe data through numerical summaries/tables/graphs

•Statistics - a numerical summary based on a sample

•Inferential statistics - taking results from a sample and extending them to the population in order to

measure the reliability of a certain result

•Parameter - a numerical summary of a population

•ex: the population mean, the population proportion

•ex: the percentage of all students on campus who have a job is 84.9%

•Statistic - a numerical summary based on a sample

•can help approximate the parameter

•ex: a sample of 250 students is obtained and from that sample, 84.9% have a job

•1. Step 1: Identify the research objective

•2. Step 2: Collect the information needed to answer the question

•3. Step 3: Describe the data

•organize/summarize the information

•4. Step 4: Draw conclusions from the data

Variables

•Variables - the characteristics of the individuals within the population

•attributes that vary between individuals within a population

•ex: age, height, weight, gender

•Qualitative/Categorical Variables - allow for the classification of individuals based on some

attribute or characteristic

•will be a preference/description/characteristic

•Quantitative Variables - provide a numerical measure of individuals

•the values of a quantitative variable can be added or subtracted to provide meaningful results

•will be a number

•Discrete Variable - a quantitative variable that has a finite/countable number of possible values

•cannot take on every possible value between two given values

•Countable - values result from counting such as 0, 1, 2, 3 and so on

•Continuous Variable - a quantitative variable that has an infinite number of possible values that it

can take on

•can take on every possible value between two given values

Data

•Data - the list of observation a variable assumes

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in

•ex: gender = variable, male/female (observations) = data

•Qualitative Data - observations corresponding to a qualitative variable

•Quantitative Data - observations corresponding to a quantitative variable

•Discrete Data - observations corresponding to a discrete variable

•Continuous Data - observations corresponding to a continuous variable

•Bias - the tendency to over-estimate or under-estimate the value of a parameter

•if the results of a sample are not representative of the population, then the sample has a bias

•Raw Data - data that is not organized

•When data is collected from a survey or designed experiment, it must be organized into a manageable

form.

•Ways to organize data

•Tables

•Graphs

•Numerical Summaries (chapter 3)

Three Sources of Bias

•Sampling Bias - the technique used to obtain the individuals used in the sample tends to favor one

part of the population over another

•Under-coverage (a type of sampling bias) - when the proportion of one segment of the population is

lower in a sample than it is in the population

•Nonresponse Bias - when the individuals in a sample do not respond to the survey but have different

opinions from those who do respond

•can be improved through the use of callbacks/rewards/incentives

•Response Bias - exists when the answers on a survey do not reflect the true feelings of the respondent

•Types of Response Bias

•Interviewer Error

•Misrepresented Answers

•Wording of Questions

•Order of Questions/Words (can lead people on)

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in

Chapter 2:

Frequency

•Frequency Distribution - lists each category of data and the number of occurrences for each

category of data (counting how frequently a category of data was & listing this distribution)

•Relative Frequency - the proportion (or percent) of observations within a category and is found

using the formula

•Formula = (frequency)/(sum of all frequencies)

•Relative Frequency Distribution - lists each category of data with the relative frequency

•Cumulative Frequency Distribution - displays the aggregate/total frequency of the category by

adding the categories together thus showing the total number of observations less than or equal to the

category while for continuous data, it displays the total number of observations

less than or equal to the upper class limit of a class.

•ex: picture to the right

•answer: B.

•why: the vertical axis (y-axis) already lists the cumulative frequency for each

grade, and for a 70, it says that the cumulative frequency is 37%, meaning

that 37% of the class got a 70 or lower

•Cumulative Relative Frequency Distribution - displays the proportion/

percentage of observations less than or equal to the category for discrete data

and the proportion/percentage of observations less than or equal to the upper

class limit for continuous data.

Charts

•Pareto Chart - a bar graph where the bars are drawn in decreasing order of frequency or relative

frequency

•Pie Chart - a circle divided into sectors, where each sector represents a category of data

•the area of each sector is proportional to the frequency of the category

•For large discrete sets, or continuous variables, it is harder to group things individually

•solution = classes

•Classes - categories into which data is grouped

•When a data set consists of a large number of different discrete data values OR when a data set

consists on continuous data, we must create classes by using intervals of numbers

•^most similar to quantitative data

•*random variable numbers go on the horizontal axis

•When organizing data, it is easy to manipulate how we present it (ex: through certain

charts)

•Stem-and-Leaf Plot - uses digits to the left or the rightmost digit to form the stem

while each rightmost digit forms a leaf

•Rather than using classes, you just list the tens digit separately

•ex: a data value of 147 would have 14 as the stem and 7 as the leaf

•ex: a data value of 4.7 would have 4 be the stem and 7 be the leaf

•ex (picture): how you display the data

•2.8, 3.8, 3.8, 3.8, 3.3, 3.9…etc.

•you must specify alongside the data table what the vertical line represents (ex:

if a number is 2.8 or 28)

•Common Distribution Shapes

•Uniform/Symmetric - a flat frequency distribution

•Bell-Shaped - data is symmetric at the highest middle point

•can fold vertically and have the left side data be equal to the right side dat

•Skewed Right - more data is clustered on the left side of the chart

•usually results from having outlying/large values on the right side of the chart

•Skewed Left - more data is clustered on the right side of the chart

•usually results from having outlying/large values on the left side of the chart

find more resources at oneclass.com

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Step 1: identify the research objective: 2. Step 2: collect the information needed to answer the question: 3. Step 3: describe the data: organize/summarize the information, 4. Distribution shape - mean v. median: skewed left - mean is substantially smaller than the median, symmetric - mean is roughly equal to the median, skewed right - mean is substantially larger than the median. Computational formula: an equivalent formula for determining the population standard deviation, square root of the (sum of the squares) - (sum of the squares) divided by (number of observations) all over the (number of observations) It must be whatever value forces the sum of the deviations about the mean to be zero. Variance - the variance of a variable is the square of the standard deviation. 14. 99 with one individual of a value 14. 13. which observation is closer to its population mean? answer: population 1.

CAS MA 113 Lecture Notes - Lecture 1: Cumulative Frequency Analysis, Continuous Or Discrete Variable, Statistical Inference

Document Summary

Get access

Related Documents

CAS MA 113 Lecture Notes - Lecture 1: Statistical Inference, Categorical Variable, Level Of Measurement

CAS MA 113 Chapter Notes - Chapter 5: Simple Random Sample, Sampling Error, Cluster Sampling

CAS MA 113 Study Guide - Midterm Guide: Confounding, Interquartile Range, Quartile