Survival Analysis

If we start with 1000 people and 10% of the population dies each year, how many people will be left in 10 years?  One could figure this out using manually.  However, for more complicated models, involving covariate predictors of survival, using survival analysis is helpful.

Survival analysis starts with a hazard function, λ(t), which gives the probability of failure each year for all survivors. In our simple example, λ(t)=.10  ∀ t.

From this we can calculate a hazard function:

  • λ(t)=limh →0 P(t≤T<t+h|T≥t)/h

The variable T is the number of periods the person survives. We also have an associated cdf, F(t), which is equal to the cumulative probability of failure for T≤t. Thus we can calculate a survival function, S(t)=1-F(t), which gives the probability a person will survive to some period after period t.  The pdf is equal to the derivative of the cdf, or also f(t)=S(t)*λ(t).

By knowing the hazard function, we can also calculate many probabilities of interest.  For instance, if a2>a1:

  • P(T≥a2|T≥a1)=exp{-∫a1 to a2 λ(s) ds}
  • P(a1≤T≤a2|T≥a1)=1-exp{-∫a1 to a2 λ(s) ds}

Wooldridge (2001) uses the example of recidivism.  Let λ(t) equal the hazard rate that criminals freed freed from jail commit another crime.  The term  λ(13) is equal to the probability a person is arrested 13 months after their release conditional on not having been arrested for a year.

Weibull Example

One example of a common hazard function is the Weibull function.  In the Weibull, we have:

  • f(t)=αγtα-1exp{-γtα}
  • λ(t)=γαtα-1
  • S(t)=exp{-γtα}

The Weibull is an attractive model because the hazard rate need not be constant over time. Also, the Weibull distribution is simple to understand. The parameter γ determines its shape and the parameter λ determines its scale. Further, if the hazard rate depends on individual characteristics, we can condition the value of λ on a vector of covariates.  If α=1, then the Weibull simplifies to the exponential distribution. This is a “memoryless” distribution where the hazard rate is constant over time (i.e., λ(t)=λ).

Proportional Hazard Models

Often, you will want to see how different covariates affect the hazard rate.  A very simple model to use is the proportional hazard model.  Here, the baseline hazard is constant and covariates have a multiplicative effect on this baseline hazard.  For instance:

  • λ(t;x) = κ(x) λ0(t)
    • κ(x) =exp{βX}
  • ln λ(t;x) = βX + ln{λ0(t)}

1 Comment

  1. Pingback: Marlene Affeld

Leave a Reply

Your email address will not be published. Required fields are marked *