# Building prognostic survival models

The purpose of any regression model is prediction. The more predictive the model, the better. In a mean squared sense the best prediction of an outcome, Y, given covariates X1 ... Xp, is the expected value of Y given X1 ... Xp. It is rare to have data providing several replicates of Y given X1 to Xp and so, typically, we impose some structure, i.e., a model, so that "everything helps with everything else.'' In survival problems things are yet more difficult since censoring will hamper empirical observations even more.

Our model (structure) may fail in a number of ways; incorrect functional form of the covariate, effects assumed to be time-independent being, in fact, time dependent and a combinations of these. Note that functional form and time independence are not, in some sense, orthogonal phenomena, so that an incorrect representation of one can impact the other. And, finally, of course, if our data set does not contain information on potentially useful covariate effects, then this cannot be accounted for.

The whole field is far too big to be covered in a single talk. Here, we limit our observations to certain aspects of predictive power and inadequacy of fit. Some large sample results can be of help. For instance, for any given class of time-dependent specifications of regression effect, the true (unknown) time-dependent effect is the one that maximizes an R2 measure. This provides an immediate practical guide to building more involved, and more accurate, models from simpler ones.