Improving fitting and predictions for flexible parametric survival models

2022 UK Stata Conference, London

Paul Lambert

University of Leicester / Karolinska Institutet

September 9, 2022

Introduction

  • The first article in The Stata Journal introduced stpm
  • I wrote stpm2 in 2007.
  • Here I will introduce stpm3

Flexible Parametric Survival Models

  • Flexible parametric survival models are used with time-to-event (survival) outcomes

  • They are “flexible” in that they use spline functions to model the effect of time

  • Easy to relax proportionality assumptions through interactions between covariates and the effect of time.

Choice of Scale

Log Cumulative Hazard (stpm2)

\[ \ln[H(t|\mathbf{x}_i)] = \ln[-\ln(S(t|\mathbf{x}_i))] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

Log odds scale (stpm2)

\[ \ln\left[\frac{1-S(t|\mathbf{x}_i)}{S(t|\mathbf{x}_i)}\right] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

Log hazard scale (strcs) ln(time)

\[ \ln\left[h(t|\mathbf{x}_i)\right] = \eta_i(t) = s\left(\ln(t)|\boldsymbol{\gamma}, \mathbf{k}_{0}\right) + \mathbf{x}_i \boldsymbol{\beta} \]

stcox vs stpm2

stcox and stpm2
. stcox hormon, nolog noshow 
------------------------------------------------------------------------------
          _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      hormon |   1.540262    .132659     5.02   0.000     1.301016    1.823503
------------------------------------------------------------------------------

. stpm2 hormon, scale(hazard) df(4) eform nolog
------------------------------------------------------------------------------
             |     exp(b)   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
      hormon |   1.540779   .1326967     5.02   0.000     1.301464    1.824099
       _rcs1 |    2.50398    .069006    33.31   0.000     2.372319    2.642949
       _rcs2 |   1.198509   .0330973     6.56   0.000     1.135364    1.265166
       _rcs3 |   1.018274   .0145595     1.27   0.205     .9901346    1.047214
       _rcs4 |   .9961938   .0067963    -0.56   0.576     .9829618    1.009604
       _cons |   .2935573   .0097629   -36.85   0.000     .2750327    .3133296
------------------------------------------------------------------------------

Predictions

Why a new command?

  • stpm2 started before Stata factor variables

  • Use better basis functions for splines

  • Make predictions easier (conditional/marginal)

  • Use frames for predictions

  • Include splines on log hazard scale

  • Include functional forms of covariates in linear predictor

Change of spline basis functions

  • stpm2 uses restricted cubic splines to model the effect of follow-up time.

    • data dependent - problems with out of sample prediction.
  • New gensplines command used by stpm3

    • Natural splines basis functions by default.

    • Also possible to use B-splines

Restricted Cubic Splines in stpm2

Natural Splines in stpm3

Factor Variables

Main effects
stpm3 i.hormon i.size age,           /// main effects
      scale(lncumhazard) df(3)       //  model scale & df


Interactions
stpm3 i.hormon##ib2.size age         /// Interactions
      scale(lncumhazard) df(3)       //  model scale & df


...with time-dependent effects
stpm3 i.hormon##i.size age,          /// Interactions
      tvc(i.hormon i.size) dftvc(2)  /// Interactions with time    
      scale(lncumhazard) df(3)       ///  model scale & df
      baselevels

Factor Variables Example


. stpm3 i.hormon##i.size age,          /// Interactions
>       tvc(i.hormon i.size) dftvc(2)  /// Interactions with time    
>       scale(lncumhazard) df(3)       ///  model scale & df
>       baselevels

Iteration 0:   log likelihood = -2909.4852  
Iteration 1:   log likelihood = -2902.0087  
Iteration 2:   log likelihood =  -2901.908  
Iteration 3:   log likelihood =  -2901.908  

                                                        Number of obs =  2,982
                                                        Wald chi2(6)  = 111.41
Log likelihood = -2901.908                              Prob > chi2   = 0.0000

-----------------------------------------------------------------------------------
                  | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
xb                |
           hormon |
              no  |          0  (base)
             yes  |    .405057    .239317     1.69   0.091    -.0639957    .8741098
                  |
             size |
         <=20 mm  |          0  (base)
       >20-50mmm  |   .4471447   .0961388     4.65   0.000     .2587162    .6355733
          >50 mm  |    1.02456   .1419684     7.22   0.000     .7463075    1.302813
                  |
      hormon#size |
   yes#>20-50mmm  |   -.097008   .2231739    -0.43   0.664    -.5344208    .3404048
      yes#>50 mm  |  -.0525507   .2586502    -0.20   0.839    -.5594958    .4543944
                  |
              age |   .0139055   .0022484     6.18   0.000     .0094988    .0183123
------------------+----------------------------------------------------------------
time              |
             _ns1 |  -28.61436   2.485529   -11.51   0.000    -33.48591   -23.74282
             _ns2 |    9.07063   1.222754     7.42   0.000     6.674077    11.46718
             _ns3 |  -1.746533   .1441763   -12.11   0.000    -2.029113   -1.463953
                  |
hormon#c._ns_tvc1 |
             yes  |   .7800628   1.573629     0.50   0.620    -2.304194     3.86432
                  |
hormon#c._ns_tvc2 |
             yes  |  -.8465712   .6738948    -1.26   0.209    -2.167381    .4742383
                  |
  size#c._ns_tvc1 |
       >20-50mmm  |  -1.370193   1.381062    -0.99   0.321    -4.077025     1.33664
          >50 mm  |   .8596402   1.567863     0.55   0.583    -2.213316    3.932596
                  |
  size#c._ns_tvc2 |
       >20-50mmm  |   1.190279    .420111     2.83   0.005     .3668762    2.013681
          >50 mm  |   1.040614   .5661812     1.84   0.066    -.0690804    2.150309
                  |
            _cons |  -.9955528   .1441542    -6.91   0.000     -1.27809   -.7130158
-----------------------------------------------------------------------------------

Extended Functions

  • We rarely assume the effect of a covariate is linear.

  • Use splines, fractional polynomials etc.

  • May have interactions with these non-linear effects

This makes predictions complicated as you need to use the appropriate values of the spline basis functions.

  • extended functions - nonlinear function in a varlist.

Extended Functions Example 1

Natural splines - @ns()
stpm3 i.hormon i.size @ns(age, df(3)), /// natural spline for age
      scale(lncumhazard) df(3)         //  model scale & df

. stpm3 i.hormon i.size @ns(age, df(3)), /// natural spline for age
>       scale(lncumhazard) df(3)         //  model scale & df

Iteration 0:   log likelihood = -2895.6086  
Iteration 1:   log likelihood = -2890.8231  
Iteration 2:   log likelihood = -2890.7951  
Iteration 3:   log likelihood = -2890.7951  

                                                        Number of obs =  2,982
                                                        Wald chi2(6)  = 355.97
Log likelihood = -2890.7951                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
xb           |
      hormon |
        yes  |   .1882895    .087913     2.14   0.032     .0159832    .3605957
             |
        size |
  >20-50mmm  |   .6183297   .0634388     9.75   0.000      .493992    .7426674
     >50 mm  |   1.169518   .0871101    13.43   0.000     .9987854    1.340251
             |
 _ns_f1_age1 |  -2.334033   .8595762    -2.72   0.007    -4.018772   -.6492949
 _ns_f1_age2 |   -.926103   .3621613    -2.56   0.011    -1.635926   -.2162799
 _ns_f1_age3 |  -1.546971    .382083    -4.05   0.000     -2.29584   -.7981023
-------------+----------------------------------------------------------------
time         |
        _ns1 |  -29.15713    1.63076   -17.88   0.000    -32.35336    -25.9609
        _ns2 |   9.752766   .8468891    11.52   0.000     8.092894    11.41264
        _ns3 |  -1.494942   .0987252   -15.14   0.000     -1.68844   -1.301444
       _cons |   .7863726   .1988728     3.95   0.000      .396589    1.176156
------------------------------------------------------------------------------
Extended functions
 (1) @ns(age, df(3))

Extended Functions Example 2

Use with interactions
stpm3 i.size                               ///
      i.hormon##@bs(age, df(3) degree(2)), /// interaction with bs function
      scale(lncumhazard) df(3)             //  model scale & df 

. stpm3 i.size                               ///
>       i.hormon##@bs(age, df(3) degree(2)), /// interaction with bs function
>       scale(lncumhazard) df(3)             //  model scale & df 

Iteration 0:   log likelihood = -2892.5789  
Iteration 1:   log likelihood = -2887.8413  
Iteration 2:   log likelihood = -2887.8145  
Iteration 3:   log likelihood = -2887.8145  

                                                        Number of obs =  2,982
                                                        Wald chi2(9)  = 362.88
Log likelihood = -2887.8145                             Prob > chi2   = 0.0000

--------------------------------------------------------------------------------------
                     | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------------+----------------------------------------------------------------
xb                   |
                size |
          >20-50mmm  |   .6178289   .0634708     9.73   0.000     .4934283    .7422294
             >50 mm  |   1.166116   .0872509    13.37   0.000     .9951074    1.337125
                     |
              hormon |
                yes  |  -2.043709   1.342542    -1.52   0.128    -4.675042    .5876238
         _bs_f1_age1 |  -.9448818   .2968562    -3.18   0.001    -1.526709   -.3630544
         _bs_f1_age2 |  -.4385819    .198683    -2.21   0.027    -.8279935   -.0491703
         _bs_f1_age3 |   .6777717   .3473155     1.95   0.051    -.0029542    1.358498
                     |
hormon#c._bs_f1_age1 |
                yes  |   3.077341   1.590321     1.94   0.053    -.0396321    6.194313
                     |
hormon#c._bs_f1_age2 |
                yes  |   1.723866    1.26789     1.36   0.174     -.761153    4.208885
                     |
hormon#c._bs_f1_age3 |
                yes  |   2.533026   1.539937     1.64   0.100    -.4851953    5.551248
---------------------+----------------------------------------------------------------
time                 |
                _ns1 |  -29.19083   1.632273   -17.88   0.000    -32.39003   -25.99164
                _ns2 |   9.761409   .8475986    11.52   0.000     8.100146    11.42267
                _ns3 |  -1.502955   .0992601   -15.14   0.000    -1.697501   -1.308409
               _cons |   .2208277   .2201609     1.00   0.316    -.2106796    .6523351
--------------------------------------------------------------------------------------
Extended functions
 (1) @bs(age, df(3) degree(2))

Extended Functions Example 3

stpm3 i.size                                 /// factor variable for size
      i.hormon##@bs(age, df(3) degree(2))    /// interaction bs function
      @poly(pr,degree(2)),                   /// quadratic for pr
      tvc(i.hormon @ns(age, df(3))) dftvc(2) /// different function of age    
      scale(lncumhazard) df(3)               //  model scale & df 
. stpm3 i.size                                 /// factor variable for size
>       i.hormon##@bs(age, df(3) degree(2))    /// interaction with bs function
>       @poly(pr,degree(2)),                   /// quadratic for pr
>       tvc(i.hormon @ns(age, df(3))) dftvc(2) /// different function of age    
>       scale(lncumhazard) df(3)               //  model scale & df 

Iteration 0:   log likelihood = -2872.1029  
Iteration 1:   log likelihood = -2851.8014  
Iteration 2:   log likelihood = -2851.0334  
Iteration 3:   log likelihood = -2851.0303  
Iteration 4:   log likelihood = -2851.0303  

                                                        Number of obs =  2,982
                                                        Wald chi2(11) = 348.55
Log likelihood = -2851.0303                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------------------
                         | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
xb                       |
                    size |
              >20-50mmm  |   .5998474    .063596     9.43   0.000     .4752015    .7244933
                 >50 mm  |   1.137436   .0872282    13.04   0.000     .9664716      1.3084
                         |
                  hormon |
                    yes  |  -1.780011   1.359924    -1.31   0.191    -4.445414    .8853911
             _bs_f1_age1 |  -.5463187   .4319832    -1.26   0.206     -1.39299    .3003528
             _bs_f1_age2 |    .071475   .2897872     0.25   0.805    -.4964974    .6394474
             _bs_f1_age3 |   2.317803   .5631477     4.12   0.000     1.214054    3.421552
                         |
    hormon#c._bs_f1_age1 |
                    yes  |   2.792389    1.62146     1.72   0.085    -.3856152    5.970393
                         |
    hormon#c._bs_f1_age2 |
                    yes  |   1.520423    1.29026     1.18   0.239    -1.008441    4.049286
                         |
    hormon#c._bs_f1_age3 |
                    yes  |    2.47379   1.580429     1.57   0.118    -.6237934    5.571374
                         |
            _poly_f1_pr1 |  -.0007745   .0001618    -4.79   0.000    -.0010916   -.0004573
            _poly_f1_pr2 |   1.88e-07   6.10e-08     3.09   0.002     6.88e-08    3.08e-07
-------------------------+----------------------------------------------------------------
time                     |
                    _ns1 |  -20.10947   4.954894    -4.06   0.000    -29.82088   -10.39805
                    _ns2 |   3.116146   2.486199     1.25   0.210    -1.756716    7.989007
                    _ns3 |  -3.280082   .5339863    -6.14   0.000    -4.326675   -2.233488
                         |
       hormon#c._ns_tvc1 |
                    yes  |  -.9786369   1.556293    -0.63   0.529    -4.028916    2.071642
                         |
       hormon#c._ns_tvc2 |
                    yes  |  -.1167935   .6836879    -0.17   0.864    -1.456797     1.22321
                         |
c._ns_f2_age1#c._ns_tvc1 |  -35.74128   16.40774    -2.18   0.029    -67.89986   -3.582687
                         |
c._ns_f2_age1#c._ns_tvc2 |   20.27338   5.665916     3.58   0.000     9.168391    31.37837
                         |
c._ns_f2_age2#c._ns_tvc1 |   3.949467   8.286761     0.48   0.634    -12.29229    20.19122
                         |
c._ns_f2_age2#c._ns_tvc2 |  -2.029115   2.174977    -0.93   0.351    -6.291992    2.233761
                         |
c._ns_f2_age3#c._ns_tvc1 |  -4.486575   5.784408    -0.78   0.438    -15.82381    6.850655
                         |
c._ns_f2_age3#c._ns_tvc2 |   5.747662   2.657394     2.16   0.031     .5392661    10.95606
                         |
                   _cons |   -.151862   .3097985    -0.49   0.624    -.7590558    .4553319
------------------------------------------------------------------------------------------
Extended functions
 (1) @bs(age, df(3) degree(2))
 (2) @ns(age, df(3))
 (3) @poly(pr, degree(2))
 

Current Extended Functions

@bs() B-splines
@fp() fractional polynomials
@ns() natural cubic splines
@poly() polynomials
@rcs() restricted cubic splines

Other functions?

  • Makes predictions much easier

Predict to Frames

  • Save predictions in current frame

  • Save predictions to a new frame

  • Merge predictions with existing frame

Standard Predictions

  • Predict at observed values of covariates (and _t)
predict   xb, xb ci
predict    S, survival ci
predict    h, hazard ci
  • By default this will be saved in current frame.

Type of predictions

  • Different types of predictions
    • Predict at observed values of covariates
    • Predict at specific values of covariates
    • Take average of predictions (marginal/standardized effects)
  • Difference choices for time
    • Predict at _t
    • Predict at single time point for all subjects
    • Predict at user specified time values

Predictions

Model with factor variables and extended function
stpm3 i.hormon i.size @ns(age, df(3)), /// natural spline for age
      scale(lncumhazard) df(3)         //  model scale & df
Predictions
predict S_age50_h0_s2 S_age60_h0_s2 S_age70_h0_s2,  ///
        at1(age 50 hormon 0 size 2)                 ///
        at2(age 60 hormon 0 size 2)                 ///
        at3(age 70 hormon 0 size 2)                 ///
        survival  ci                                ///
        timevalues(0 10, step(0.1))                 ///
        frame(f1)                                   
frame f1: list S_age50_h0_s2 S_age60_h0_s2 S_age70_h0_s2 in 1/10

List Predictions


. frame f1: list tt S_age50_h0_s2 S_age60_h0_s2 S_age70_h0_s2 in 1/10,  ab(13)

     +----------------------------------------------------+
     | tt   S_age50_h0_s2   S_age60_h0_s2   S_age70_h0_s2 |
     |----------------------------------------------------|
  1. |  0               1               1               1 |
  2. | .1       .99996383       .99995863       .99994405 |
  3. | .2       .99973446       .99969626       .99958923 |
  4. | .3       .99916874        .9990492       .99871432 |
  5. | .4       .99817701       .99791498        .9971812 |
     |----------------------------------------------------|
  6. | .5        .9967111       .99623878       .99491659 |
  7. | .6       .99475256       .99399981       .99189377 |
  8. | .7       .99230356       .99120106       .98811855 |
  9. | .8        .9893804       .98786173       .98361906 |
 10. | .9       .98600871       .98401177        .9784382 |
     +----------------------------------------------------+

Plot Predictions

Predictions
frame f1: {
  twoway (line S_age50_h0_s2 S_age60_h0_s2 S_age70_h0_s2 tt), ///
         ytitle("Survival function")                          ///
         xtitle("Years since surgery")                        ///
         legend(order(1 "Age 50" 2 "Age 60" 3 "age 70")       ///
                ring(0) pos(1) cols(1))
}

Merge to Existing Frame

Predictions
predict S_age50_h1_s2 S_age60_h1_s2 S_age70_h1_s2,  ///
        at1(age 50 hormon 1 size 2)                 ///
        at2(age 60 hormon 1 size 2)                 ///
        at3(age 70 hormon 1 size 2)                 ///
        survival ci                                 ///
        frame(f1, merge)   

. frame f1 {
.   list tt S_age50_h0_s2 S_age50_h1_s2 in 1/5, abbrev(13)

     +------------------------------------+
     | tt   S_age50_h0_s2   S_age50_h1_s2 |
     |------------------------------------|
  1. |  0               1               1 |
  2. | .1       .99996383       .99995634 |
  3. | .2       .99973446       .99967945 |
  4. | .3       .99916874       .99899661 |
  5. | .4       .99817701       .99779973 |
     +------------------------------------+
. }

Predict at single timepoint

Predictions
predict S10,           ///
        at1(.)         /// observed values
        survival       ///
        timevalue(10)  /// single time value
        merge          //  merge in current frame 

Type of Predictions

cumhazard - cumulative hazard function
failure - failure function
hazard - hazard function
lnhazard - ln(hazard) function
rmft - restricted mean failure time
rmst - restricted mean survival time
survival - survival function
xb - the linear predictor
  • Use ci or se for confidence intervals/standard errors.

  • use contrast(difference) or contrast(ratio) for contrasts.

Contrasts

  • You can have as many at options as you want.

  • Obtain differences or ratios between covariate patterns.

stpm3 i.hormon @ns(age, df(3)), scale(lncumhazard) df(5) 
predict S70h0 S70h1, survival ci                 ///
                     at1(age 70 hormon 0)        ///
                     at2(age 70 hormon 1)        ///
                     timevalues(0 5, step(0.1))  ///
                     contrast(difference)        ///
                     contrastvar(Sdiff)          ///
                     frame(f1)
  • Set which at option is the reference with atreference()

Plot Contrasts

frame f1 {
  twoway (rarea Sdiff_lci Sdiff_uci tt, color(red%30)) ///
         (line Sdiff tt, color(red))                   ///
         , xtitle("Years since surgery")               ///
         ytitle("Difference in S(t)")                  ///
         ylabel(,format(%3.2f))                        ///
         legend(off)
}                     

Standardization

  • This was the main motivation for introducing factor variables and extended functions.

  • standsurv is margins for survival data.

  • Interactions with exposures made code complicated and prone to error when using standsurv

Regression Standardization

  • Fit regression model with exposure, \(X\), and confounders, \(Z\)

  • Predict outcome if all individuals were exposed (\(X=0\)) and unexposed (\(X=1\))

  • Perform contrast \[ \frac{1}{N} \sum_{i=1}^N {\widehat{S}(t|X=1,Z=z_i)} - \frac{1}{N} \sum_{i=1}^N {\widehat{S}(t|X=0,Z=z_i)} \]

  • Similar ideas in competing risks, relative survival, multistate models

Competing Risks Example

// Load Data
use https://www.pclambert.net/data/rott2b

// Cancer Model
stset os, f(cause==1) scale(12) exit(time 365.24*10)
stpm3 i.hormon i.size enodes @ns(age,df(3))  ///
      i.hormon#i.size i.hormon#c.enodes      ///
      i.hormon#@ns(age,df(3)),               ///
      scale(logcumhazard) df(4) 
estimates store cancer      

Competing Risks Example

// Load Data
use https://www.pclambert.net/data/rott2b

// Cancer Model
stset os, f(cause==1) scale(12) exit(time 365.24*10)
stpm3 i.hormon i.size enodes @ns(age,df(3))  ///
      i.hormon#i.size i.hormon#c.enodes      ///
      i.hormon#@ns(age,df(3)),               ///
      scale(logcumhazard) df(4) 
estimates store cancer           
      
// Other cause Model
stset os, f(cause==2) scale(12) exit(time 365.24*10)
stpm3 i.hormon i.size enodes @ns(age,df(3))  ///
      i.hormon#i.size i.hormon#c.enodes      ///
      i.hormon#@ns(age,df(3)),               ///
      scale(logcumhazard) df(4) 
estimates store other       

Using standsurv

range tt 0 10 101
standsurv, cif crmodels(cancer other) ci    ///
           timevar(tt)                      ///
           at1(hormon 0)                    ///
           at2(hormon 1)                    ///
           atvar(CIF0 CIF1)                 ///
           contrast(difference)             ///
           contrastvar(CIFdiff)  

Installation

  • Still running various checks etc.

  • A test version is available

Installation

net install stpm3_test, from (https://www.pclambert.net/downloads/stpm3_test)

net install standsurv, from (https://www.pclambert.net/downloads/standsurv_test)

net install gensplines, from (https://www.pclambert.net/downloads/gensplines)

References

Lambert, P. C., and P. Royston. 2009. “Further Development of Flexible Parametric Models for Survival Analysis.” The Stata Journal 9: 265–90.
Royston, P. 2001. “Flexible Parametric Alternatives to the Cox Model, and More.” The Stata Journal 1: 1–28.
Royston, P., and P. C. Lambert. 2011. Flexible Parametric Survival Analysis in Stata: Beyond the Cox Model. Stata Press.
Syriopoulou, Elisavet, Sarwar I. Mozumder, Mark J. Rutherford, and Paul C. Lambert. 2022. “Estimating Causal Effects in the Presence of Competing Events Using Regression Standardisation with the Stata Command standsurv.” BMC Medical Research Methodology 22 (August): 226. https://doi.org/10.1186/s12874-022-01666-x.
Wang, Wenjie, and Jun Yan. 2021. “Shape-Restricted Regression Splines with R Package splines2.” Journal of Data Science 19 (3): 498–517. https://doi.org/10.6339/21-JDS1020.