PSY248 Study Guide - Final Guide: Semen Analysis, Variance Inflation Factor, Linear Combination
CORRELATION/REGRESSION
Before statistics;
• Step 1: understand the research question
• Step 2: how are DV and IV measured?
• Step 3: choose method of analysis (correlation/regression)
Conduct correlation/regression analysis;
• Part 1: univariate
• Part 2: bivariate
• Part 3: regression and check assumptions
→ Correlation asks is there a RELATIONSHIP between two variables
→ Regression refers to PREDICTION
UNIVARIATE
Step 1: understand research question
Example:
• IV = age of carer
• DV = carer distress
• 1. What is the level of carer distress? Does carer distress vary? (univariate)
o Produce descriptive statistics
• 2. Is carer distress (linearly) related to age of carer? (bivariate)
o Produce correlational statistics
• 3. Does knowledge about age help predict level of carer distress? (regression)
o Produce regression
• This is NON-EXPERIMENTAL research
Step 2: how are the DV/IV measured?
• We look at
o How we intended to measure (questionnaire)
o What we actually ended up with (results – data)
• Decide level of measurement – categorical, ordinal or interval
• Then we need this to decide what statistical analysis to use
• This leads to either correlation/regression = IV/DV continuous/interval/numeric and normal
• For instance:
o Age: categorical (young, middle, old), distress: categorical (low, medium, high) = chi-
square appropriate
o Age: categorical (young, middle, old), distress: numerical = one-way ANOVA
o Age: numerical, distress: numerical = correlation/regression
• Specific Health Questionnaire; consisted of 15 items
o Study had age categories (one of 9) – but we are going to treat them as continuous
variables (numeric)
Step 3: choose method of analysis
• Produce histograms and describe them based on 5 assumptions
Assumptions of normality:
• Central tendency
o Typical or average score, centre of distribution, peak in the distribution – does it exist?
• Variability (SD and range)
find more resources at oneclass.com
find more resources at oneclass.com
o Do all the cases tend to score at about the same point or are they widely scattered –
width of distribution
• Skewness
o Symmetry vs. lopsidedness of distribution
o Positive (right) skew, negative (left) skew, symmetric distributions have no skew
o Symmetry/Standard error of skew = symmetrical (between -2 and +2 = unskewed)
• Kurtosis
o Flatness or peakness of a distribution. Platykurtic (flat), leptokurtic (very peaked) and
mesokurtic (a normal distribution)
o Kurtosis = kurtosis/standard error of kurtosis
▪ Between -2 and +2 = mesokurtic
• Modal characteristics (modality)
o Frequency of peaks as unimodal, bimodal or multimodal
o A distribution with no mode is a uniform or rectangular distribution
o In general, the presence of more than one frequency peak (mode) in a distribution
means that the data represent several relatively homogenous subgroups within the
larger sample being studied
o You want your distribution to be unimodal – indicates homogeneity
USING SPSS
• Use the frequencies command to produce graphical and numeric summaries of all five
possible DVs
• Analyse → descriptive stats → frequencies
o Ask for all the 5 diff histogram characteristics
• Pasted syntax created by filling out point and click dialogue boxes
• Ask questions based on graphical summary
o Central tendency – yes
o Variability – yes
o Kurtosis – mesokurtic
o Skewness – no skew
o Modality – unimodal
BIVARIATE
• Create scatterplot and Pearson’s r in SPSS
Use SPSS point and click to produce scatterplot
• Graphs (legacy dialogs) → scatter → simple → define
• Simple scatterplot → put DV on Y axis, IV on X axis
Scatterplot (7 Assumptions)
1. Linear Relationship (straight line)
• Increase or decrease in the same direction, and at the same rate
2. Monotonic Relationship
• Increase or decrease in the same relative direction, but not at the same rate
3. Outliers
4. Gaps
5. Direction of the function that describes relationship between 2 variables
• Positive, negative, or no relationship
6. Effect of X on Y (slope): the steeper the slope, the greater the effect
• Do the points cluster around the imaginary straight line?
7. Correlation (strength of relationship)
• ±0.3 – weak
• ±0.5 – moderate
find more resources at oneclass.com
find more resources at oneclass.com
• ±0.7 – strong
→ Provided we are happy that all 7 steps are okay, we can appropriately summarise the relationship
numerically by calculating a (Pearson) correlation
Analyse → Correlate → Bivariate
• Pearson correlation
• Numeric summary table appears of Pearson’s correlation coefficient between the two
variables
• -.5 = confirms negative relationship from scatterplot (found in Pearson correlation/SHTOT);
if they were both -1; all dots would be plotted on the line perfectly
o Also explains weak/moderate correlation
• As a rule of thumb, say:
o Correlations between 0 and 0.29 (pos and neg) are ‘weak’
o Correlations between 0.30 and 0.59 are ‘moderate’
o Correlations between 0.60 and 1.00 are ‘strong’
• Significance depends on two things
o The size of the relationship AND
o The sample size
Five points about correlations:
1. Note that SPSS reports that our correlation is significant ( p <0.0005)
• Significance = population correlation is not equal to zero
• If sig. 2 tailed states .000 = you cannot conclude that it is 0% as it may go beyond the
decimals
2. If asked, SPSS will always calculate a linear correlation even when appropriate to do so.
Always inspect bivariate scatterplot first to determine that a linear correlation is appropriate.
3. R = 0.00 does not always mean NO correlation, it means no LINEAR correlation
4. Always report ranges of X and Y – we do not know what happens to relationship beyond
range of our data
5. Correlation does not imply causation
REGRESSION
• Regression = prediction
• F or t statistic represents the prediction (F = �)
o If it is significant:
▪ Write and interpret regression equation (of line going through scatterplot)
▪ Comment on and interpret %
▪ Check 4 assumptions of regression
Syntax SPSS:
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT shtot /METHOD=ENTER age
1. Equation of the straight line
• Unstandardised B coefficients → conform to general equation of a line
• Regression equation
o Predicted distress = y-axis intercept + slope (age/independent variable)
▪ Y-axis intercept = the constant (identified from coefficient table); this number
will be the point where the line would cross the y-axis
▪ Always make sure you write PREDICTED (as there can still be error)
▪ Use info under B to write down those values into the equation
▪ E.g. predicted distress = 38.25+(-2.62)(age)
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Before statistics: step 1: understand the research question, step 2: how are dv and iv measured, step 3: choose method of analysis (correlation/regression) Conduct correlation/regression analysis: part 1: univariate, part 2: bivariate, part 3: regression and check assumptions. Correlation asks is there a relationship between two variables. Iv = age of carer: dv = carer distress, 1. Does carer distress vary? (univariate: produce descriptive statistics, 2. Is carer distress (linearly) related to age of carer? (bivariate: produce correlational statistics, 3. Does knowledge about age help predict level of carer distress? (regression: produce regression, this is non-experimental research. Step 3: choose method of analysis: produce histograms and describe them based on 5 assumptions. Bivariate: create scatterplot and pearson"s r in spss. Use spss point and click to produce scatterplot: graphs (legacy dialogs) scatter simple define, simple scatterplot put dv on y axis, iv on x axis. Scatterplot (7 assumptions: linear relationship (straight line)