BIOL 4150 Lecture Notes - Lecture 4: Horse Length
Textbook: Chapter 15
There has been a conceptual revolution in recent years
in how to evaluate the acceptability of alternate models,
with rapidly evolving acceptance of information theory
as the basis for assessment
•
Very often in biology we want to decide which of the
numerous mathematical expressions best represent the data
we have at hand.
'r' vs. population size of wildebeest
○
Break down equation in r
!
Sigma1, sumsq1 -not a vector
!
N = a vector
!
Sum = a function
◊
(…) = an argument
◊
Sum(…)
!
Computing vectors (e.g. order <-
order; resid1<-rpredict)
□
*see slide for formula and r coding
!
Deviation between observation and
regression line (aka the s.d.)
□
Residual variation around the density-
dependent regression (sigma) = 0.055
!
We now proceed to calculate the residuals around
the density-dependent Ricker formula:
○
Λ = 7.927*10^10
!
L= -ln(Λ) = -25.096
!
Note: p1= 3 because we estimated
rmax, K, and sigma
□
AIC = -42.35
!
Likelihood:
○
Model 1: Ricker logistic model
•
We can calculate the AIC and compare it
!
An alternate, and simpler, hypothesis is that the
wildebeest population is just growing at a
constant rate
○
*see slide
!
rmax= mean(r) = 0.036
!
MSE = ….. (residual variance or mean
squared error)
!
Sigma = sqrt(MSE) = 0.085 *there is more
variation
!
AIC = -30 (model 1 is more
parsimonious than model 2)
□
*note: only 2 parameters
!
Calculations:
○
Model 2: Geometric growth model
•
Note: theta = steepness of curvature
!
'r' curves with population size
○
Sigma=0.04 *less deviation
○
4 parameters
!
*see r coding
○
AIC = -49.573 **most parsimonious model
○
Model 3: Theta-logistic model
•
Ex. Serengeti wildebeest
i=3
○
w= (exp(-7))/ (exp(-7) + exp(-19) + exp(-0))
○
*see slide for equation
•
Model 1 -w=7.266*10^4
•
Model 2 -w=6.307*10^-9
•
Model 3 -w=0.999 (99.9% probability that this is the
most parsimonious)
•
*Akaike weights give the probability that the model is the
most parsimonious in a given set of models
Rmax = intercept on y axis
•
K = intercept on K axis
•
Curvature is used to reduced residual variation
•
NOTE: r vs. N graph
Use concept of likelihood (proportional to the probability of our observations given the model)
•
The likelihood of the entire data set is calculated by multiplying the probabilities of each of the
separate datum values
•
Note: 'pi-like' symbol = multiply
□
It is the probability of the full set of observations given the model (and normal
distribution)
□
*see equation on slide (right side of equation --> normal distribution)
!
For example, if the sample error for each observation is normally distributed:
○
The probability of each observation depends on the distribution of the sample error
•
*see slide
!
Λ = 7.927*10^10
!
L= -25.096 (smaller is more accurate; doesn't have to be negative)
!
For example, if our first model is the density-dependent Ricker formula with residuals that
are normally distributed with mean=0 and s.d=0.055, the likelihood and negative log-
likelihood would be calculated as:
○
Because likelihoods are often quite large numbers, researchers sometimes work with the negative
log-likelihood (L)
•
In other words, AIC scores reflect the degree of fit and model complexity
○
This allows us to estimate which model offers the most parsimonious explanation of the data
○
AIC = an information criterion
○
Note: the AIC score (see slide for equation) gets smaller as the likelihood gets larger, but gets larger
with increasing number of parameters
•
*In order to formally evaluate alternate models, we need to score their predictive power
Model Evaluation
Tuesday,+ October+3,+2017
11:26+AM
Document Summary
Very often in biology we want to decide which of the numerous mathematical expressions best represent the data we have at hand. There has been a conceptual revolution in recent years in how to evaluate the acceptability of alternate models, with rapidly evolving acceptance of information theory as the basis for assessment. We now proceed to calculate the residuals around the density-dependent ricker formula: Sum = a function ( ) = an argument. Residual variation around the density- dependent regression (sigma) = 0. 055. Deviation between observation and regression line (aka the s. d. ) Note: p1 = 3 because we estimated rmax, k, and sigma. An alternate, and simpler, hypothesis is that the. *in order to formally evaluate alternate models, we need to score their predictive power. Use concept of likelihood (proportional to the probability of our observations given the model) The likelihood of the entire data set is calculated by multiplying the probabilities of each of the separate datum values.