# Multivariable Analysis

13 Multivariable Analysis

# II Assumptions Underlying Multivariable Methods

## A Conceptual Understanding of Equations for Multivariable Analysis

The four independent variables on the right side of the equation are almost certainly not of exactly equal importance. Equation 13-2 can be improved by giving each independent variable a coefficient, which is a weighting factor measuring its relative importance in predicting prognosis. The equation becomes:

(13-3)

Before equation 13-3 can become useful for estimating survival for an individual patient, two other factors are required: (1) a measure to quantify the starting point for the calculation and (2) a measure of the error in the predicted value of y for each observation (because statistical prediction is almost never perfect for a single individual). By inserting a starting point and an error term, the ≈ symbol (meaning “varies with”) can be replaced by an equal sign. Abbreviating the weights with a W, the equation now becomes:

(13-4)

Although equation 13-5 looks complex, it really means the same thing as equations 13-1 through 13-4.

What is this equation really saying? It states that the dependent variable (y) can be predicted for each person at diagnosis by beginning with a standard starting point (a), then making an adjustment for the new information supplied by the first variable (age), plus a further adjustment for the information provided by the second variable (anatomic stage), and so on, until an adjustment is made for the last independent variable (comordity) and for the almost inevitable error in the resulting prediction of the prognosis for any given study participant.

## C General Linear Model

The multivariable equation shown in equation 13-6 is usually called the general linear model. The model is general because there are many variations regarding the types of variables for y and xi and the number of x variables that can be used. The model is linear because it is a linear combination of the xi terms. For the xi variables, a variety of transformations might be used to improve the model’s “fit” (e.g., square of xi, square root of xi, or logarithm of xi). The combination of terms would still be linear, however, if all the coefficients (the bi terms) were to the first power. The model does not remain linear if any of the coefficients is taken to any power other than 1 (e.g., b2). Such equations are much more complex and are beyond the scope of this discussion.

Numerous procedures for multivariable analysis are based on the general linear model. These include methods with such imposing designations as analysis of variance (ANOVA), analysis of covariance (ANCOVA), multiple linear regression, multiple logistic regression, the log-linear model, and discriminant function analysis. As discussed subsequently and outlined in Table 13-1, the choice of which procedure to use depends primarily on whether the dependent and independent variables are continuous, dichotomous, nominal, or ordinal. Knowing that the procedures listed in Table 13-1 are all variations of the same theme (the general linear model) helps to make them less confusing. Detailing these methods is beyond the scope of this text but readily available both online* and in print.4