CAS MA 113 Lecture Notes - Lecture 1: Cumulative Frequency Analysis, Continuous Or Discrete Variable, Statistical Inference

68 views11 pages
Chapter 1:
General Definitions
Statistics - the science of collecting, organizing, summarizing, analyzing information to draw
conclusions/answer questions
also about providing a measure of confidence in any conclusion.
ex: given a set of data, how can we use digital & numeric ways to analyze this while making bigger
inferences about this
Data - a “fact or proposition used to draw a conclusion or make a decision’
also can describe the characteristics of an individual
‘information’ = data
Margin of Error - measure of confidence
*a good sample should have a margin of error
Population - the entire group of individuals who are being studied
Sample - a subset of the population that is being studied
statistics usually uses a population sample to make an inference about the general population
Individual - a person or object who is a member of the population being studied
Descriptive Statistics - the process of organizing and summarizing data
describe data through numerical summaries/tables/graphs
Statistics - a numerical summary based on a sample
Inferential statistics - taking results from a sample and extending them to the population in order to
measure the reliability of a certain result
Parameter - a numerical summary of a population
ex: the population mean, the population proportion
ex: the percentage of all students on campus who have a job is 84.9%
Statistic - a numerical summary based on a sample
can help approximate the parameter
ex: a sample of 250 students is obtained and from that sample, 84.9% have a job
1. Step 1: Identify the research objective
2. Step 2: Collect the information needed to answer the question
3. Step 3: Describe the data
organize/summarize the information
4. Step 4: Draw conclusions from the data
Variables
Variables - the characteristics of the individuals within the population
attributes that vary between individuals within a population
ex: age, height, weight, gender
Qualitative/Categorical Variables - allow for the classification of individuals based on some
attribute or characteristic
will be a preference/description/characteristic
Quantitative Variables - provide a numerical measure of individuals
the values of a quantitative variable can be added or subtracted to provide meaningful results
will be a number
Discrete Variable - a quantitative variable that has a finite/countable number of possible values
cannot take on every possible value between two given values
Countable - values result from counting such as 0, 1, 2, 3 and so on
Continuous Variable - a quantitative variable that has an infinite number of possible values that it
can take on
can take on every possible value between two given values
Data
Data - the list of observation a variable assumes
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in
ex: gender = variable, male/female (observations) = data
Qualitative Data - observations corresponding to a qualitative variable
Quantitative Data - observations corresponding to a quantitative variable
Discrete Data - observations corresponding to a discrete variable
Continuous Data - observations corresponding to a continuous variable
Bias - the tendency to over-estimate or under-estimate the value of a parameter
if the results of a sample are not representative of the population, then the sample has a bias
Raw Data - data that is not organized
When data is collected from a survey or designed experiment, it must be organized into a manageable
form.
Ways to organize data
Tables
Graphs
Numerical Summaries (chapter 3)
Three Sources of Bias
Sampling Bias - the technique used to obtain the individuals used in the sample tends to favor one
part of the population over another
Under-coverage (a type of sampling bias) - when the proportion of one segment of the population is
lower in a sample than it is in the population
Nonresponse Bias - when the individuals in a sample do not respond to the survey but have different
opinions from those who do respond
can be improved through the use of callbacks/rewards/incentives
Response Bias - exists when the answers on a survey do not reflect the true feelings of the respondent
Types of Response Bias
Interviewer Error
Misrepresented Answers
Wording of Questions
Order of Questions/Words (can lead people on)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in
Chapter 2:
Frequency
Frequency Distribution - lists each category of data and the number of occurrences for each
category of data (counting how frequently a category of data was & listing this distribution)
Relative Frequency - the proportion (or percent) of observations within a category and is found
using the formula
Formula = (frequency)/(sum of all frequencies)
Relative Frequency Distribution - lists each category of data with the relative frequency
Cumulative Frequency Distribution - displays the aggregate/total frequency of the category by
adding the categories together thus showing the total number of observations less than or equal to the
category while for continuous data, it displays the total number of observations
less than or equal to the upper class limit of a class.
ex: picture to the right
answer: B.
why: the vertical axis (y-axis) already lists the cumulative frequency for each
grade, and for a 70, it says that the cumulative frequency is 37%, meaning
that 37% of the class got a 70 or lower
Cumulative Relative Frequency Distribution - displays the proportion/
percentage of observations less than or equal to the category for discrete data
and the proportion/percentage of observations less than or equal to the upper
class limit for continuous data.
Charts
Pareto Chart - a bar graph where the bars are drawn in decreasing order of frequency or relative
frequency
Pie Chart - a circle divided into sectors, where each sector represents a category of data
the area of each sector is proportional to the frequency of the category
For large discrete sets, or continuous variables, it is harder to group things individually
solution = classes
Classes - categories into which data is grouped
When a data set consists of a large number of different discrete data values OR when a data set
consists on continuous data, we must create classes by using intervals of numbers
^most similar to quantitative data
*random variable numbers go on the horizontal axis
When organizing data, it is easy to manipulate how we present it (ex: through certain
charts)
Stem-and-Leaf Plot - uses digits to the left or the rightmost digit to form the stem
while each rightmost digit forms a leaf
Rather than using classes, you just list the tens digit separately
ex: a data value of 147 would have 14 as the stem and 7 as the leaf
ex: a data value of 4.7 would have 4 be the stem and 7 be the leaf
ex (picture): how you display the data
2.8, 3.8, 3.8, 3.8, 3.3, 3.9…etc.
you must specify alongside the data table what the vertical line represents (ex:
if a number is 2.8 or 28)
Common Distribution Shapes
Uniform/Symmetric - a flat frequency distribution
Bell-Shaped - data is symmetric at the highest middle point
can fold vertically and have the left side data be equal to the right side dat
Skewed Right - more data is clustered on the left side of the chart
usually results from having outlying/large values on the right side of the chart
Skewed Left - more data is clustered on the right side of the chart
usually results from having outlying/large values on the left side of the chart
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 11 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Step 1: identify the research objective: 2. Step 2: collect the information needed to answer the question: 3. Step 3: describe the data: organize/summarize the information, 4. Distribution shape - mean v. median: skewed left - mean is substantially smaller than the median, symmetric - mean is roughly equal to the median, skewed right - mean is substantially larger than the median. Computational formula: an equivalent formula for determining the population standard deviation, square root of the (sum of the squares) - (sum of the squares) divided by (number of observations) all over the (number of observations) It must be whatever value forces the sum of the deviations about the mean to be zero. Variance - the variance of a variable is the square of the standard deviation. 14. 99 with one individual of a value 14. 13. which observation is closer to its population mean? answer: population 1.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents