|
Body fat is hard to measure, but the predictor variables are easy to obtain.
Model (X1) Fit and ANOVA
|
|
Model (X2) Fit and ANOVA
|
|
Model (X1, X2) Fit and ANOVA
|
|
Model (X1, X2, X3) Fit and ANOVA
|
|
|
|
|
Source of Variation |
SS |
df |
MS |
Regression |
|
3 |
|
|
|
1 |
|
|
|
1 |
|
|
|
1 |
|
Error |
|
|
|
Total |
|
|
|
Source of Variation |
SS |
df |
MS |
Regression |
396.98 |
3 |
132.27 |
|
352.17 |
1 |
352.27 |
|
33.17 |
1 |
33.17 |
|
11.54 |
1 |
11.54 |
Error |
98.41 |
|
6.15 |
Total |
495.39 |
|
|
The GLM Procedure
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 396.9846118 132.3282039 21.52 <.0001
Error 16 98.4048882 6.1503055
Corrected Total 19 495.3895000
R-Square Coeff Var Root MSE y Mean
0.801359 12.28017 2.479981 20.19500
Source DF Type I SS Mean Square F Value Pr > F
x1 1 352.2697968 352.2697968 57.28 <.0001
x2 1 33.1689128 33.1689128 5.39 0.0337
x3 1 11.5459022 11.5459022 1.88 0.1896
Source DF Type III SS Mean Square F Value Pr > F
x1 1 12.70489278 12.70489278 2.07 0.1699
x2 1 7.52927788 7.52927788 1.22 0.2849
x3 1 11.54590217 11.54590217 1.88 0.1896
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 117.0846948 99.78240295 1.17 0.2578
x1 4.3340920 3.01551136 1.44 0.1699
x2 -2.8568479 2.58201527 -1.11 0.2849
x3 -2.1860603 1.59549900 -1.37 0.189
Mean Squares |
Note that each extra sum of squares involving a single extra X variable has associated with it one degree of freedom. |
Extra Sum of Squares from Several Variables |
Extra sums of squares involving two extra X variables, such as SSR(X2, X3| X1), have two degrees of freedom associated with them. This follows because we can express such an extra sum of squares as a sum of two extra sums of squares, each associated with one degree of freedom. |
t-test (6.51b) discussed in chapter 6
General Linear Test Approach
Full Model
|
|
Hypotheses |
|
Reduced Model when H0 holds |
|
General Form of Test Statistic |
|
Form of Test Statistic for Testing a Single Beta Coefficient Equal Zero |
We don’t need to fit both the full model and the reduced model. Only fitting a full model in SAS will provide the MSR(X3| X1 , X2) and MSE(X1,X2,X3). See the SAS output |
Note: (1) here the t-test and F-test are equivalent test.
(2) the F test to test whether or not 3=0 is called a partial F test
(3) the F test to test whether or not all k=0 is called the overall F test.
Full Model
|
|
Hypotheses |
|
Reduced Model when H0 holds |
|
General Form of Test Statistic |
|
Form of Test Statistic for Testing Several Beta Coefficients Equal Zero |
|
Example: Body Fat |
|
|
Full Model |
|
Hypotheses |
|
Reduced Model when H0 holds |
|
General Test Statistic |
Two Predictor VariablesThe coefficient of multiple determination measures the proportionate reduction in the variation of Y achieved by the introduction of the entire set of X variables. Coefficient of Partial Determination uses Y and X1 both “adjusted for X2” and measure the proportionate reduction in the variation of the “adjusted Y” by including the “adjusted X1.” (comments 2 on page 270)
|
|
General Case |
|
Example When X2 is added to model containing X1, SSE is reduced by 23.2% When X3 is added to model containing X1 and X2, SSE is reduced by 10.5% When X1 is added to model containing X2, SSE is reduced by only 3.1% |
|
Some questions frequently asked are:
What is the relative importance of the effects of the different predictor variables?
What is the magnitude of the effect of a given predictor variable on the response variable?
Can any predictor variable be dropped from the model because it has little or no effect on the response variable?
Should any predictor variable not yet included in the model be considered for possible inclusion?
If the predictor variables included in the model are
uncorrelated among themselves and
uncorrelated with any other predictor variables that are related to the response variable but are omitted from the model
then relative simple answers can be given. Unfortunately, in many nonexperimental situations in business economics, and social and biological sciences, the predictor variables are correlated.
For example:
Family food expenditures (Y).
Correlated predictors in model: Family income (X1), Family savings (X2), Age of head of household (X3).
Correlated with predictors outside model: Family size (X4).
Models: |
|
X1 and X2 are uncorrelated. the regression coefficient for X1 is the same for both model (1) and (2). The same holds for regression coefficient for X2. conduct controlled experiments since the levels of the predictor variables can be chosen to ensure they are uncorrelated SSR(X1|X2)=SSR(X1) SSR(X2|X1)=SSR(X2) |
|||
(1) |
|
||||
(2) |
|
||||
(3) |
|
Case |
|
|
|
|
|
1 |
2 |
6 |
23 |
23 |
23 |
2 |
8 |
9 |
83 |
83 |
83 |
3 |
6 |
8 |
63 |
63 |
63 |
4 |
10 |
10 |
103 |
103 |
103 |
Models: |
|
|
(1) |
|
Perfect Relation between predictors: X2=5+0.5 X1 |
(2) |
|
The perfect relation between X1 and X2 do not inhibit our ability to obtain a good fit to the data.
Since many different response functions provide the same good fit, we cannot interpret any one set of regression coefficients as reflecting the effect of different predictor variables.
We seldom find variables that are perfectly correlated. However, the implication just noted in our idealized example still have relevance.
The fact that some or all predictor variables are correlated among themselves does not, in general, inhibit our ability to obtain a good fit.
The counterpart in real life to many different regression functions providing equally good fits to the data in our idealized example is that the estimated regression coefficients tend to have large sampling variability when the predictor variables are highly correlated.
The common interpretation of a regression coefficient as measuring the change in the expected value of the response variable when the given predictor variable is increased by one unit while all the other predictors are held constant is not fully applicable when multicollinearity exits.
|
Effects on Regression CoefficientsEstimates of coefficients change a lot as each variable is entered in the model.In Model (3) although the F-test is significant, none of the t-tests for individuals coefficients is significant.In Model (3) the variances of the coefficients are inflated. |
|
The standard error of estimate is not substantially improved as more variables are entered in the model. Thus fitted values and predictions are neither more nor less precise. |
|
Theoretical reason for inflated variance: As the correlation between the predictors increases to one, the variance increases to infinity.The primed variables Y’, X1’, X2’ are called the “correlation transformation.”The X’X matrix of the primed variables is the correlation matrix rXX.As (r12)2 approaches 1 the variances march off to infinity. |
For more details, please read page 272-278 of ALSM |
1984 ADVANCED PLACEMENT EXAM PART I MULTIPLE CHOICE NOTE
3 PRACTICE QUIZ 2 MULTIPLE CHOICE 1
39 CHAPTER 9 EARNINGS MULTIPLES EARNINGS MULTIPLES REMAIN
Tags: extra sum, uses extra, multiple, extra, squares, regression