Multiple Linear Regression Analysis: Difference between revisions

From ReliaWiki
Jump to navigation Jump to search
(Created page with 'Multiple Linear Regression Analysis This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A maj…')
 
No edit summary
 
(299 intermediate revisions by 7 users not shown)
Line 1: Line 1:
Multiple Linear Regression Analysis
{{Template:Doebook|4}}
This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A major portion of the results displayed in DOE++ are explained in this chapter because these results are associated with multiple linear regression. One of the applications of multiple linear regression models is Response Surface Methodology (RSM). RSM is a method used to locate the optimum value of the response and is one of the final stages of experimentation. It is discussed in Chapter 9. Towards the end of this chapter, the concept of using indicator variables in regression models is explained. Indicator variables are used to represent qualitative factors in regression models. The concept of using indicator variables is important to gain an understanding of ANOVA models, which are the models used to analyze data obtained from experiments. These models can be thought of as first order multiple linear regression models where all the factors are treated as qualitative factors. ANOVA models are discussed in Chapter 6, Analysis of Experiments.
This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A major portion of the results displayed in [https://koi-3QN72QORVC.marketingautomation.services/net/m?md=Rw01CJDOxn%2FabhkPlZsy6DwBQ%2BaCXsGR Weibull++] DOE folios are explained in this chapter because these results are associated with multiple linear regression. One of the applications of multiple linear regression models is Response Surface Methodology (RSM). RSM is a method used to locate the optimum value of the response and is one of the final stages of experimentation. It is discussed in [[Response_Surface_Methods_for_Optimization| Response Surface Methods]]. Towards the end of this chapter, the concept of using indicator variables in regression models is explained. Indicator variables are used to represent qualitative factors in regression models. The concept of using indicator variables is important to gain an understanding of ANOVA models, which are the models used to analyze data obtained from experiments. These models can be thought of as first order multiple linear regression models where all the factors are treated as qualitative factors. ANOVA models are discussed in the [[One Factor Designs]] and [[General Full Factorial Designs]] chapters.


Multiple Linear Regression Model
==Multiple Linear Regression Model==
A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The following model is a multiple linear regression model with two predictor variables, and .  
 
(1)
A linear regression model that contains more than one predictor variable is called a ''multiple linear regression model''. The following model is a multiple linear regression model with two predictor variables, <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>.  




The model is linear because it is linear in the parameters , and . The model describes a plane in the three dimensional space of ,  and . The parameter  is the intercept of this plane. Parameters  and  are referred to as partial regression coefficients. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. [Note]
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>




Consider the following example of a multiple linear regression model with two predictor variables,  and :
The model is linear because it is linear in the parameters <math>{{\beta }_{0}}\,\!</math>, <math>{{\beta }_{1}}\,\!</math> and <math>{{\beta }_{2}}\,\!</math>. The model describes a plane in the three-dimensional space of <math>Y\,\!</math>, <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>. The parameter <math>{{\beta }_{0}}\,\!</math> is the intercept of this plane. Parameters <math>{{\beta }_{1}}\,\!</math> and <math>{{\beta }_{2}}\,\!</math> are referred to as ''partial regression coefficients''. Parameter <math>{{\beta }_{1}}\,\!</math> represents the change in the mean response corresponding to a unit change in <math>{{x}_{1}}\,\!</math> when <math>{{x}_{2}}\,\!</math> is held constant. Parameter <math>{{\beta }_{2}}\,\!</math> represents the change in the mean response corresponding to a unit change in <math>{{x}_{2}}\,\!</math>  when <math>{{x}_{1}}\,\!</math>  is held constant. 
(2)
Consider the following example of a multiple linear regression model with two predictor variables, <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math> :


This regression model is a first order multiple linear regression model. This is because the maximum power of the variables in the model is one. The regression plane corresponding to this model is shown in Figure 5.1. Also shown is an observed data point and the corresponding random error, . The true regression model is usually never known (and therefore the values of the random error terms corresponding to observed data points remain unknown). However, the regression model can be estimated by calculating the parameters of the model for an observed data set. This is explained in Chapter 5, Estimating Regression Models Using Least Squares.


::<math>Y=30+5{{x}_{1}}+7{{x}_{2}}+\epsilon \,\!</math>




This regression model is a first order multiple linear regression model. This is because the maximum power of the variables in the model is 1. (The regression plane corresponding to this model is shown in the figure below.) Also shown is an observed data point and the corresponding random error, <math>\epsilon\,\!</math>. The true regression model is usually never known (and therefore the values of the random error terms corresponding to observed data points remain unknown). However, the regression model can be estimated by calculating the parameters of the model for an observed data set. This is explained in [[Multiple_Linear_Regression_Analysis#Estimating_Regression_Models_Using_Least_Squares| Estimating Regression Models Using Least Squares]].
   
   
 
One of the following figures shows the contour plot for the regression model the above equation. The contour plot shows lines of constant mean response values as a function of <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>. The contour lines for the given regression model are straight lines as seen on the plot. Straight contour lines result for first order regression models with no interaction terms.
Figure 5.1: Regression plane for the model .
   
   
A linear regression model may also take the following form:


 
Figure 5.2 shows the contour plot for the regression model of Eqn. (2). The contour plot shows lines of constant mean response values as a function of  and . The contour lines for the given regression model are straight lines as seen on the plot. Straight contour lines result for first order regression models with no interaction terms.


::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\!</math>




Figure 5.2: Contour plot for the model .
A cross-product term, <math>{{x}_{1}}{{x}_{2}}\,\!</math>, is included in the model. This term represents an interaction effect between the two variables <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>. Interaction means that the effect produced by a change in the predictor variable on the response depends on the level of the other predictor variable(s). As an example of a linear regression model with interaction, consider the model given by the equation <math>Y=30+5{{x}_{1}}+7{{x}_{2}}+3{{x}_{1}}{{x}_{2}}+\epsilon\,\!</math>. The regression plane and contour plot for this model are shown in the following two figures, respectively.


 
A linear regression model may also take the following form:
(3)


[[Image:doe5.1.png|center|437px|Regression plane for the model <math>Y=30+5 x_1+7 x_2+\epsilon\,\!</math>]]


A cross-product term, , is included in the model. This term represents an interaction effect between the two variables  and . Interaction means that the effect produced by a change in the predictor variable on the response depends on the level of the other predictor variable(s). As an example of a linear regression model with interaction, consider the model given by the equation . The regression plane and contour plot for this model are shown in Figures 5.3 and 5.4, respectively.


[[Image:doe5.2.png|center|337px|Countour plot for the model <math>Y=30+5 x_1+7 x_2+\epsilon\,\!</math>]]




Now consider the regression model shown next:


Figure 5.3: Regression plane for the model .


 
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}x_{1}^{2}+{{\beta }_{3}}x_{1}^{3}+\epsilon\,\!</math>


Figure 5.4: Contour plot for the model .


 
This model is also a linear regression model and is referred to as a ''polynomial regression model''. Polynomial regression models contain squared and higher order terms of the predictor variables making the response surface curvilinear. As an example of a polynomial regression model with an interaction term consider the following equation:
Now consider the regression model shown next:
(4)




This model is also a linear regression model and is referred to as a polynomial regression model. Polynomial regression models contain squared and higher order terms of the predictor variables making the response surface curvilinear. As an example of a polynomial regression model with an interaction term consider the following equation:
::<math>Y=500+5{{x}_{1}}+7{{x}_{2}}-3x_{1}^{2}-5x_{2}^{2}+3{{x}_{1}}{{x}_{2}}+\epsilon\,\!</math>
(5)




This model is a second order model because the maximum power of the terms in the model is two. The regression surface for this model is shown in Figure 5.5. Such regression models are used in RSM to find the optimum value of the response, (for details see Chapter 9, Response Surface Methods). Notice that, although the shape of the regression surface is curvilinear, the regression model of Eqn. (5) is still linear because the model is linear in the parameters. The contour plot for this model is shown in Figure 5.6.
This model is a ''second order'' model because the maximum power of the terms in the model is two. The regression surface for this model is shown in the following figure. Such regression models are used in RSM to find the optimum value of the response, <math>Y\,\!</math> (for details see [[Response_Surface_Methods_for_Optimization| Response Surface Methods for Optimization]]). Notice that, although the shape of the regression surface is curvilinear, the regression model is still linear because the model is linear in the parameters. The contour plot for this model is shown in the second of the following two figures.




All multiple linear regression models can be expressed in the following general form:
[[Image:doe5.3.png|center|437px|Regression plane for the model <math>Y=30+5 x_1+7 x_2+3 x_1 x_2+\epsilon\,\!</math>]]
(6)


where  denotes the number of terms in the model. For example, the model of Eqn. (5) can be written in the general form using ,  and  as follows:


[[Image:doe5.4.png|center|337px|Countour plot for the model <math>Y=30+5 x_1+7 x_2+3 x_1 x_2+\epsilon\,\!</math>]]


 


Figure 5.5: Regression surface for the model .
All multiple linear regression models can be expressed in the following general form:


 


::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+...+{{\beta }_{k}}{{x}_{k}}+\epsilon\,\!</math>


Figure 5.6: Contour plot for the model .


 
where <math>k\,\!</math> denotes the number of terms in the model. For example, the model can be written in the general form using <math>{{x}_{3}}=x_{1}^{2}\,\!</math>, <math>{{x}_{4}}=x_{2}^{3}\,\!</math> and <math>{{x}_{5}}={{x}_{1}}{{x}_{2}}\,\!</math> as follows:
Multiple Linear Regression Analysis
This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A major portion of the results displayed in DOE++ are explained in this chapter because these results are associated with multiple linear regression. One of the applications of multiple linear regression models is Response Surface Methodology (RSM). RSM is a method used to locate the optimum value of the response and is one of the final stages of experimentation. It is discussed in Chapter 9. Towards the end of this chapter, the concept of using indicator variables in regression models is explained. Indicator variables are used to represent qualitative factors in regression models. The concept of using indicator variables is important to gain an understanding of ANOVA models, which are the models used to analyze data obtained from experiments. These models can be thought of as first order multiple linear regression models where all the factors are treated as qualitative factors. ANOVA models are discussed in Chapter 6, Analysis of Experiments.


Multiple Linear Regression Model
A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The following model is a multiple linear regression model with two predictor variables,  and .
(1)


::<math>Y=500+5{{x}_{1}}+7{{x}_{2}}-3{{x}_{3}}-5{{x}_{4}}+3{{x}_{5}}+\epsilon\,\!</math>


The model is linear because it is linear in the parameters ,  and . The model describes a plane in the three dimensional space of ,  and . The parameter  is the intercept of this plane. Parameters  and  are referred to as partial regression coefficients. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. [Note]
==Estimating Regression Models Using Least Squares==


Consider a multiple linear regression model with <math>k\,\!</math> predictor variables:


Consider the following example of a multiple linear regression model with two predictor variables,  and :
(2)


This regression model is a first order multiple linear regression model. This is because the maximum power of the variables in the model is one. The regression plane corresponding to this model is shown in Figure 5.1. Also shown is an observed data point and the corresponding random error, . The true regression model is usually never known (and therefore the values of the random error terms corresponding to observed data points remain unknown). However, the regression model can be estimated by calculating the parameters of the model for an observed data set. This is explained in Chapter 5, Estimating Regression Models Using Least Squares.
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+...+{{\beta }_{k}}{{x}_{k}}+\epsilon\,\!</math>




Let each of the <math>k\,\!</math> predictor variables, <math>{{x}_{1}}\,\!</math>, <math>{{x}_{2}}\,\!</math>... <math>{{x}_{k}}\,\!</math>, have <math>n\,\!</math> levels. Then <math>{{x}_{ij}}\,\!</math> represents the <math>i\,\!</math> th level of the <math>j\,\!</math> th predictor variable <math>{{x}_{j}}\,\!</math>. For example, <math>{{x}_{51}}\,\!</math> represents the fifth level of the first predictor variable <math>{{x}_{1}}\,\!</math>, while <math>{{x}_{19}}\,\!</math> represents the first level of the ninth predictor variable, <math>{{x}_{9}}\,\!</math>. Observations, <math>{{y}_{1}}\,\!</math>, <math>{{y}_{2}}\,\!</math>... <math>{{y}_{n}}\,\!</math>, recorded for each of these <math>n\,\!</math> levels can be expressed in the following way:




Figure 5.1: Regression plane for the model .
::<math>\begin{align}
   
{{y}_{1}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{11}}+{{\beta }_{2}}{{x}_{12}}+...+{{\beta }_{k}}{{x}_{1k}}+{{\epsilon }_{1}} \\
{{y}_{2}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{21}}+{{\beta }_{2}}{{x}_{22}}+...+{{\beta }_{k}}{{x}_{2k}}+{{\epsilon }_{2}} \\
& .. \\
{{y}_{i}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{i1}}+{{\beta }_{2}}{{x}_{i2}}+...+{{\beta }_{k}}{{x}_{ik}}+{{\epsilon }_{i}} \\
& .. \\
{{y}_{n}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{n1}}+{{\beta }_{2}}{{x}_{n2}}+...+{{\beta }_{k}}{{x}_{nk}}+{{\epsilon }_{n}}  
\end{align}\,\!</math>


 
Figure 5.2 shows the contour plot for the regression model of Eqn. (2). The contour plot shows lines of constant mean response values as a function of  and . The contour lines for the given regression model are straight lines as seen on the plot. Straight contour lines result for first order regression models with no interaction terms.


The system of <math>n\,\!</math> equations shown previously can be represented in matrix notation as follows:




Figure 5.2: Contour plot for the model .
::<math>y=X\beta +\epsilon\,\!</math>


 
A linear regression model may also take the following form:
(3)


:where


A cross-product term, , is included in the model. This term represents an interaction effect between the two variables  and . Interaction means that the effect produced by a change in the predictor variable on the response depends on the level of the other predictor variable(s). As an example of a linear regression model with interaction, consider the model given by the equation . The regression plane and contour plot for this model are shown in Figures 5.3 and 5.4, respectively.


   
::<math>y=\left[ \begin{matrix}
  {{y}_{1}} \\
  {{y}_{2}}  \\
  .  \\
  .  \\
  .  \\
  {{y}_{n}}  \\
\end{matrix} \right]\text{      }X=\left[ \begin{matrix}
  1 & {{x}_{11}} & {{x}_{12}} & . & . & . & {{x}_{1n}}  \\
  1 & {{x}_{21}} & {{x}_{22}} & . & . & . & {{x}_{2n}}  \\
  . & . & . & {} & {} & {} & .  \\
  . & . & . & {} & {} & {} & .  \\
  . & . & . & {} & {} & {} & .  \\
  1 & {{x}_{n1}} & {{x}_{n2}} & . & . & . & {{x}_{nn}}  \\
\end{matrix} \right]\,\!</math>




   
::<math>\beta =\left[ \begin{matrix}
  {{\beta }_{0}} \\
  {{\beta }_{1}}  \\
  .  \\
  .  \\
  .  \\
  {{\beta }_{n}}  \\
\end{matrix} \right]\text{    and  }\epsilon =\left[ \begin{matrix}
  {{\epsilon }_{1}}  \\
  {{\epsilon }_{2}}  \\
  .  \\
  .  \\
  .  \\
  {{\epsilon }_{n}}  \\
\end{matrix} \right]\,\!</math>


Figure 5.3: Regression plane for the model .


 
The matrix <math>X\,\!</math> is referred to as the ''design matrix''. It contains information about the levels of the predictor variables at which the observations are obtained. The vector <math>\beta\,\!</math> contains all the regression coefficients. To obtain the regression model, <math>\beta\,\!</math> should be known. <math>\beta\,\!</math> is estimated using least square estimates. The following equation is used:
   


Figure 5.4: Contour plot for the model .


 
::<math>\hat{\beta }={{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y\,\!</math>
Now consider the regression model shown next:
(4)




This model is also a linear regression model and is referred to as a polynomial regression model. Polynomial regression models contain squared and higher order terms of the predictor variables making the response surface curvilinear. As an example of a polynomial regression model with an interaction term consider the following equation:
where <math>^{\prime }\,\!</math> represents the transpose of the matrix while <math>^{-1}\,\!</math> represents the matrix inverse. Knowing the estimates, <math>\hat{\beta }\,\!</math>, the multiple linear regression model can now be estimated as:
(5)




This model is a second order model because the maximum power of the terms in the model is two. The regression surface for this model is shown in Figure 5.5. Such regression models are used in RSM to find the optimum value of the response, (for details see Chapter 9, Response Surface Methods). Notice that, although the shape of the regression surface is curvilinear, the regression model of Eqn. (5) is still linear because the model is linear in the parameters. The contour plot for this model is shown in Figure 5.6.
::<math>\hat{y}=X\hat{\beta }\,\!</math>




All multiple linear regression models can be expressed in the following general form:
The estimated regression model is also referred to as the ''fitted model''. The observations, <math>{{y}_{i}}\,\!</math>, may be different from the fitted values <math>{{\hat{y}}_{i}}\,\!</math> obtained from this model. The difference between these two values is the residual, <math>{{e}_{i}}\,\!</math>. The vector of residuals, <math>e\,\!</math>, is obtained as:
(6)


where  denotes the number of terms in the model. For example, the model of Eqn. (5) can be written in the general form using ,  and  as follows:


::<math>e=y-\hat{y}\,\!</math>


 


Figure 5.5: Regression surface for the model .
The fitted model can also be written as follows, using <math>\hat{\beta }={{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y\,\!</math>:


 


   
::<math>\begin{align}
  \hat{y} &= & X\hat{\beta } \\
  & = & X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\
& = & Hy 
\end{align}\,\!</math>


Figure 5.6: Contour plot for the model .


 
where <math>H=X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}\,\!</math>. The matrix, <math>H\,\!</math>, is referred to as the hat matrix. It transforms the vector of the observed response values, <math>y\,\!</math>, to the vector of fitted values, <math>\hat{y}\,\!</math>.
Multiple Linear Regression Analysis
This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A major portion of the results displayed in DOE++ are explained in this chapter because these results are associated with multiple linear regression. One of the applications of multiple linear regression models is Response Surface Methodology (RSM). RSM is a method used to locate the optimum value of the response and is one of the final stages of experimentation. It is discussed in Chapter 9. Towards the end of this chapter, the concept of using indicator variables in regression models is explained. Indicator variables are used to represent qualitative factors in regression models. The concept of using indicator variables is important to gain an understanding of ANOVA models, which are the models used to analyze data obtained from experiments. These models can be thought of as first order multiple linear regression models where all the factors are treated as qualitative factors. ANOVA models are discussed in Chapter 6, Analysis of Experiments.


Multiple Linear Regression Model
===Example===
A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The following model is a multiple linear regression model with two predictor variables, and .  
An analyst studying a chemical process expects the yield to be affected by the levels of two factors, <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>. Observations recorded for various levels of the two factors are shown in the following table. The analyst wants to fit a first order regression model to the data. Interaction between <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math> is not expected based on knowledge of similar processes. Units of the factor levels and the yield are ignored for the analysis.
(1)




The model is linear because it is linear in the parameters ,  and . The model describes a plane in the three dimensional space of ,  and . The parameter  is the intercept of this plane. Parameters  and  are referred to as partial regression coefficients. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. Parameter  represents the change in the mean response corresponding to a unit change in  when  is held constant. [Note]
[[Image:doet5.1.png||center|351px|Observed yield data for various levels of two factors.|link=]]


   
   
The data of the above table can be entered into the Weibull++ DOE folio using the multiple linear regression folio tool as shown in the following figure.


Consider the following example of a multiple linear regression model with two predictor variables,  and :
(2)


This regression model is a first order multiple linear regression model. This is because the maximum power of the variables in the model is one. The regression plane corresponding to this model is shown in Figure 5.1. Also shown is an observed data point and the corresponding random error, . The true regression model is usually never known (and therefore the values of the random error terms corresponding to observed data points remain unknown). However, the regression model can be estimated by calculating the parameters of the model for an observed data set. This is explained in Chapter 5, Estimating Regression Models Using Least Squares.
[[Image:doe5_7.png|center|618px|Multiple Regression tool in Webibull++ with the data in the table.|link=]]




A scatter plot for the data is shown next.
 


[[Image:doe5.8.png|center|361px|Three-dimensional scatter plot for the observed data in the table.|link=]]


Figure 5.1: Regression plane for the model .


 
The first order regression model applicable to this data set having two predictor variables is:
Figure 5.2 shows the contour plot for the regression model of Eqn. (2). The contour plot shows lines of constant mean response values as a function of  and . The contour lines for the given regression model are straight lines as seen on the plot. Straight contour lines result for first order regression models with no interaction terms.




::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>


Figure 5.2: Contour plot for the model .


 
where the dependent variable, <math>Y\,\!</math>, represents the yield and the predictor variables, <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math>, represent the two factors respectively. The <math>X\,\!</math> and <math>y\,\!</math> matrices for the data can be obtained as:
A linear regression model may also take the following form:
(3)




A cross-product term, , is included in the model. This term represents an interaction effect between the two variables and . Interaction means that the effect produced by a change in the predictor variable on the response depends on the level of the other predictor variable(s). As an example of a linear regression model with interaction, consider the model given by the equation . The regression plane and contour plot for this model are shown in Figures 5.3 and 5.4, respectively.
::<math>X=\left[ \begin{matrix}
  1 & 41.9 & 29.1  \\
  1 & 43.4 & 29.3  \\
  . & . & .  \\
  . & . & .  \\
  . & . & \\
  1 & 77.8 & 32.9  \\
\end{matrix} \right]\text{    }y=\left[ \begin{matrix}
  251.3  \\
  251.3 \\
  . \\
  .  \\
  .  \\
  349.0  \\
\end{matrix} \right]\,\!</math>




The least square estimates, <math>\hat{\beta }\,\!</math>, can now be obtained:




Figure 5.3: Regression plane for the model .
::<math>\begin{align}
   
  \hat{\beta } &= & {{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\
& = & {{\left[ \begin{matrix}
  17 & 941 & 525.3 \\
  941 & 54270 & 29286  \\
  525.3 & 29286 & 16254  \\
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}
  4902.8  \\
  276610  \\
  152020  \\
\end{matrix} \right] \\
& = & \left[ \begin{matrix}
  -153.51  \\
  1.24  \\
  12.08  \\
\end{matrix} \right]  
\end{align}\,\!</math>


 


Figure 5.4: Contour plot for the model .
:Thus:


 
Now consider the regression model shown next:
(4)


   
::<math>\hat{\beta }=\left[ \begin{matrix}
  {{{\hat{\beta }}}_{0}} \\
  {{{\hat{\beta }}}_{1}}  \\
  {{{\hat{\beta }}}_{2}}  \\
\end{matrix} \right]=\left[ \begin{matrix}
  -153.51  \\
  1.24  \\
  12.08  \\
\end{matrix} \right]\,\!</math>


This model is also a linear regression model and is referred to as a polynomial regression model. Polynomial regression models contain squared and higher order terms of the predictor variables making the response surface curvilinear. As an example of a polynomial regression model with an interaction term consider the following equation:
(5)


and the estimated regression coefficients are <math>{{\hat{\beta }}_{0}}=-153.51\,\!</math>, <math>{{\hat{\beta }}_{1}}=1.24\,\!</math> and <math>{{\hat{\beta }}_{2}}=12.08\,\!</math>. The fitted regression model is:


This model is a second order model because the maximum power of the terms in the model is two. The regression surface for this model is shown in Figure 5.5. Such regression models are used in RSM to find the optimum value of the response,  (for details see Chapter 9, Response Surface Methods). Notice that, although the shape of the regression surface is curvilinear, the regression model of Eqn. (5) is still linear because the model is linear in the parameters. The contour plot for this model is shown in Figure 5.6.


   
::<math>\begin{align}
  \hat{y} & = & {{{\hat{\beta }}}_{0}}+{{{\hat{\beta }}}_{1}}{{x}_{1}}+{{{\hat{\beta }}}_{2}}{{x}_{2}} \\
  & = & -153.5+1.24{{x}_{1}}+12.08{{x}_{2}} 
\end{align}\,\!</math>


All multiple linear regression models can be expressed in the following general form:
(6)


where  denotes the number of terms in the model. For example, the model of Eqn. (5) can be written in the general form using , and  as follows:
The fitted regression model can be viewed in the Weibull++ DOE folio, as shown next.




 
[[Image:doe5_9.png|center|434px|Equation of the fitted regression model for the data from the table.|link=]]


Figure 5.5: Regression surface for the model .


 
A plot of the fitted regression plane is shown in the following figure.




Figure 5.6: Contour plot for the model .
[[Image:doe5.10.png|center|362px|Fitted regression plane <math>\hat{y}=-153.5+1.24 x_1+12.08 x_2\,\!</math> for the data from the table.|link=]]


 
Estimating Regression Models Using Least Squares


Consider a multiple linear regression model with  predictor variables:
The fitted regression model can be used to obtain fitted values, <math>{{\hat{y}}_{i}}\,\!</math>, corresponding to an observed response value, <math>{{y}_{i}}\,\!</math>. For example, the fitted value corresponding to the fifth observation is:




   
::<math>\begin{align}
  {{{\hat{y}}}_{i}} &= & -153.5+1.24{{x}_{i1}}+12.08{{x}_{i2}} \\
  {{{\hat{y}}}_{5}} & = & -153.5+1.24{{x}_{51}}+12.08{{x}_{52}} \\
  & = & -153.5+1.24(47.3)+12.08(29.9) \\
& = & 266.3 
\end{align}\,\!</math>


Let each of the  predictor variables, , ..., have  levels. Then  represents the th level of the th predictor variable . For example,  represents the fifth level of the first predictor variable , while  represents the first level of the ninth predictor variable, . Observations, , ..., recorded for each of these  levels can be expressed in the following way:


The observed fifth response value is <math>{{y}_{5}}=273.0\,\!</math>. The residual corresponding to this value is:




The system of equations shown previously can be represented in matrix notation as follows:
::<math>\begin{align}
(7)
  {{e}_{i}} & = & {{y}_{i}}-{{{\hat{y}}}_{i}} \\
  {{e}_{5}}& = & {{y}_{5}}-{{{\hat{y}}}_{5}} \\
  & = & 273.0-266.3 \\
& = & 6.7
\end{align}\,\!</math>


where:


In Weibull++ DOE folios, fitted values and residuals are shown in the Diagnostic Information table of the detailed summary of results. The values are shown in the following figure.




[[Image:doe5_11.png|center|886px|Fitted values and residuals for the data in the table.|link=]]




The matrix  in Eqn. (7) is referred to as the design matrix. It contains information about the levels of the predictor variables at which the observations are obtained. [Note] The vector  contains all the regression coefficients. To obtain the regression model,  should be known.  is estimated using least square estimates. The following equation is used:
The fitted regression model can also be used to predict response values. For example, to obtain the response value for a new observation corresponding to 47 units of <math>{{x}_{1}}\,\!</math> and 31 units of <math>{{x}_{2}}\,\!</math>, the value is calculated using:
(8)


where  represents the transpose of the matrix while  represents the matrix inverse. Knowing the estimates, , the multiple linear regression model can now be estimated as:
(9)


   
::<math>\begin{align}
  \hat{y}(47,31)& = & -153.5+1.24(47)+12.08(31) \\
  & = & 279.26 
\end{align}\,\!</math>


The estimated regression model is also referred to as the fitted model. The observations, , may be different from the fitted values  obtained from this model. The difference between these two values is the residual, . The vector of residuals, , is obtained as:
===Properties of the Least Square Estimators for Beta===
(10)
The least square estimates, <math>{{\hat{\beta }}_{0}}\,\!</math>, <math>{{\hat{\beta }}_{1}}\,\!</math>, <math>{{\hat{\beta }}_{2}}\,\!</math>... <math>{{\hat{\beta }}_{k}}\,\!</math>, are unbiased estimators of <math>{{\beta }_{0}}\,\!</math>, <math>{{\beta }_{1}}\,\!</math>, <math>{{\beta }_{2}}\,\!</math>... <math>{{\beta }_{k}}\,\!</math>, provided that the random error terms, <math>{{\epsilon }_{i}}\,\!</math>, are normally and independently distributed. The variances of the <math>\hat{\beta }\,\!</math> s are obtained using the <math>{{({{X}^{\prime }}X)}^{-1}}\,\!</math> matrix. The variance-covariance matrix of the estimated regression coefficients is obtained as follows:
The fitted model of Eqn. (9) can also be written as follows, using  from Eqn. (8):
(11)
where . The matrix, , is referred to as the hat matrix. It transforms the vector of the observed response values, , to the vector of fitted values, .




Example 5.1
::<math>C={{\hat{\sigma }}^{2}}{{({{X}^{\prime }}X)}^{-1}}\,\!</math>




An analyst studying a chemical process expects the yield to be affected by the levels of two factors, and . Observations recorded for various levels of the two factors are shown in Table 5.1. The analyst wants to fit a first order regression model to the data. Interaction between  and  is not expected based on knowledge of similar processes. Units of the factor levels and the yield are ignored for the analysis.
<math>C\,\!</math> is a symmetric matrix whose diagonal elements, <math>{{C}_{jj}}\,\!</math>, represent the variance of the estimated <math>j\,\!</math> th regression coefficient, <math>{{\hat{\beta }}_{j}}\,\!</math>. The off-diagonal elements, <math>{{C}_{ij}}\,\!</math>, represent the covariance between the <math>i\,\!</math> th and <math>j\,\!</math> th estimated regression coefficients, <math>{{\hat{\beta }}_{i}}\,\!</math> and <math>{{\hat{\beta }}_{j}}\,\!</math>. The value of <math>{{\hat{\sigma }}^{2}}\,\!</math> is obtained using the error mean square, <math>M{{S}_{E}}\,\!</math>. The variance-covariance matrix for the data in the table (see [[Multiple_Linear_Regression_Analysis#Estimating_Regression_Models_Using_Least_Squares| Estimating Regression Models Using Least Squares]]) can be viewed in the DOE folio, as shown next.




[[Image:doe5_12.png|center|619px|The variance-covariance matrix for the data in table.|link=]]


Table 5.1: Observed yield data for various levels of two factors.


 
Calculations to obtain the matrix are given in this [[Multiple_Linear_Regression_Analysis#Example| example]]. The positive square root of <math>{{C}_{jj}}\,\!</math> represents the estimated standard deviation of the <math>j\,\!</math> th regression coefficient, <math>{{\hat{\beta }}_{j}}\,\!</math>, and is called the estimated standard error of <math>{{\hat{\beta }}_{j}}\,\!</math> (abbreviated <math>se({{\hat{\beta }}_{j}})\,\!</math> ).
The data of Table 5.1 can be entered into DOE++ using the Multiple Regression tool as shown in Figure 5.7. A scatter plot for the data in Table 5.1 is shown in Figure 5.8. The first order regression model applicable to this data set having two predictor variables is:




where the dependent variable, , represents the yield and the predictor variables, and , represent the two factors respectively. The  and  matrices for the data can be obtained as:
::<math>se({{\hat{\beta }}_{j}})=\sqrt{{{C}_{jj}}}\,\!</math>


==Hypothesis Tests in Multiple Linear Regression==
This section discusses hypothesis tests on the regression coefficients in multiple linear regression. As in the case of simple linear regression, these tests can only be carried out if it can be assumed that the random error terms, <math>{{\epsilon }_{i}}\,\!</math>, are normally and independently distributed with a mean of zero and variance of <math>{{\sigma }^{2}}\,\!</math>.
Three types of hypothesis tests can be carried out for multiple linear regression models:


#Test for significance of regression: This test checks the significance of the whole regression model.
#<math>t\,\!</math> test: This test checks the significance of individual regression coefficients.
#<math>F\,\!</math> test: This test can be used to simultaneously check the significance of a number of regression coefficients. It can also be used to test individual coefficients.


Figure: 5.7: Multiple Regression tool in DOE++ with the data in Table 5.1.
===Test for Significance of Regression===
The test for significance of regression in the case of multiple linear regression analysis is carried out using the analysis of variance. The test is used to check if a linear statistical relationship exists between the response variable and at least one of the predictor variables. The statements for the hypotheses are:


 


 
::<math>\begin{align}
   
  & {{H}_{0}}:& {{\beta }_{1}}={{\beta }_{2}}=...={{\beta }_{k}}=0 \\
& {{H}_{1}}:& {{\beta }_{j}}\ne 0\text{ for at least one }j 
\end{align}\,\!</math>


Figure 5.8: Three dimensional scatter plot for the observed data in Table 5.1.


 
The test for <math>{{H}_{0}}\,\!</math> is carried out using the following statistic:
 




The least square estimates, , can now be obtained:
::<math>{{F}_{0}}=\frac{M{{S}_{R}}}{M{{S}_{E}}}\,\!</math>




Thus:
where <math>M{{S}_{R}}\,\!</math> is the regression mean square and <math>M{{S}_{E}}\,\!</math> is the error mean square. If the null hypothesis, <math>{{H}_{0}}\,\!</math>, is true then the statistic <math>{{F}_{0}}\,\!</math> follows the <math>F\,\!</math> distribution with <math>k\,\!</math> degrees of freedom in the numerator and <math>n-\,\!</math> ( <math>k+1\,\!</math> ) degrees of freedom in the denominator.  The null hypothesis, <math>{{H}_{0}}\,\!</math>, is rejected if the calculated statistic, <math>{{F}_{0}}\,\!</math>, is such that:




and the estimated regression coefficients are , and . The fitted regression model is:
::<math>{{F}_{0}}>{{f}_{\alpha ,k,n-(k+1)}}\,\!</math>




====Calculation of the Statistic <math>{{F}_{0}}\,\!</math>====
To calculate the statistic <math>{{F}_{0}}\,\!</math>, the mean squares <math>M{{S}_{R}}\,\!</math> and <math>M{{S}_{E}}\,\!</math> must be known. As explained in [http://reliawiki.com/index.php/Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis], the mean squares are obtained by dividing the sum of squares by their degrees of freedom. For example, the total mean square, <math>M{{S}_{T}}\,\!</math>, is obtained as follows:


In DOE++, the fitted regression model can be viewed using the Show Analysis Summary icon in the Control Panel. The model is shown in Figure 5.9.


::<math>M{{S}_{T}}=\frac{S{{S}_{T}}}{dof(S{{S}_{T}})}\,\!</math>




Figure 5.9: Equation of the fitted regression model for the data in Table 5.1.
where <math>S{{S}_{T}}\,\!</math> is the total sum of squares and <math>dof(S{{S}_{T}})\,\!</math> is the number of degrees of freedom associated with <math>S{{S}_{T}}\,\!</math>. In multiple linear regression, the following equation is used to calculate <math>S{{S}_{T}}\,\!</math> :


 
A plot of the fitted regression plane is shown in Figure 5.10. The fitted regression model can be used to obtain fitted values, , corresponding to an observed response value, . For example, the fitted value corresponding to the fifth observation is:


::<math>S{{S}_{T}}={{y}^{\prime }}\left[ I-(\frac{1}{n})J \right]y\,\!</math>


 


Figure 5.10: Fitted regression plane  for the data of Table 5.1.
where <math>n\,\!</math> is the total number of observations, <math>y\,\!</math> is the vector of observations (that was defined in [http://reliawiki.com/index.php/Multiple_Linear_Regression_Analysis#Estimating_Regression_Models_Using_Least_Squares| Estimating Regression Models Using Least Squares]), <math>I\,\!</math> is the identity matrix of order <math>n\,\!</math> and <math>J\,\!</math> represents an <math>n\times n\,\!</math> square matrix of ones. The number of degrees of freedom associated with <math>S{{S}_{T}}\,\!</math>, <math>dof(S{{S}_{T}})\,\!</math>, is ( <math>n-1\,\!</math> ). Knowing <math>S{{S}_{T}}\,\!</math> and <math>dof(S{{S}_{T}})\,\!</math> the total mean square, <math>M{{S}_{T}}\,\!</math>, can be calculated.


 
The regression mean square, <math>M{{S}_{R}}\,\!</math>, is obtained by dividing the regression sum of squares, <math>S{{S}_{R}}\,\!</math>, by the respective degrees of freedom, <math>dof(S{{S}_{R}})\,\!</math>, as follows:
The observed fifth response value is . The residual corresponding to this value is:




In DOE++, fitted values and residuals are available using the Diagnostic icon in the Control Panel. The values are shown in Figure 5.11. The fitted regression model can also be used to predict response values. For example, to obtain the response value for a new observation corresponding to 47 units of  and 31 units of , the value is calculated using:
::<math>M{{S}_{R}}=\frac{S{{S}_{R}}}{dof(S{{S}_{R}})}\,\!</math>




The regression sum of squares, <math>S{{S}_{R}}\,\!</math>, is calculated using the following equation:


Figure 5.11: Fitted values and residuals for the data in Table 5.1.


Properties of the Least Square Estimators,
::<math>S{{S}_{R}}={{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y\,\!</math>
The least square estimates, , , ..., are unbiased estimators of , , ..., provided that the random error terms, , are normally and independently distributed. The variances of the s are obtained using the  matrix. The variance-covariance matrix of the estimated regression coefficients is obtained as follows:
(12)




is a symmetric matrix whose diagonal elements, , represent the variance of the estimated th regression coefficient, . The off-diagonal elements, , represent the covariance between the th and th estimated regression coefficients, and . The value of is obtained using the error mean square, , which can be calculated as discussed in the beginning of Chapter 5, Multiple Linear Regression Analysis. The variance-covariance matrix for the data in Table 5.1 is shown in Figure 5.12. It is available in DOE++ using the Show Analysis Summary icon in the Control Panel. Calculations to obtain the matrix are given in Example 5.3 in Chapter 5, Test on Individual Regression Coefficients. The positive square root of  represents the estimated standard deviation of the th regression coefficient, , and is called the estimated standard error of (abbreviated ).
where <math>n\,\!</math> is the total number of observations, <math>y\,\!</math> is the vector of observations, <math>H\,\!</math> is the hat matrix and <math>J\,\!</math> represents an <math>n\times n\,\!</math> square matrix of ones. The number of degrees of freedom associated with <math>S{{S}_{R}}\,\!</math>, <math>dof(S{{S}_{E}})\,\!</math>, is <math>k\,\!</math>, where <math>k\,\!</math> is the number of predictor variables in the model. Knowing <math>S{{S}_{R}}\,\!</math> and <math>dof(S{{S}_{R}})\,\!</math> the regression mean square, <math>M{{S}_{R}}\,\!</math>, can be calculated.
(13)
The error mean square, <math>M{{S}_{E}}\,\!</math>, is obtained by dividing the error sum of squares, <math>S{{S}_{E}}\,\!</math>, by the respective degrees of freedom, <math>dof(S{{S}_{E}})\,\!</math>, as follows:




::<math>M{{S}_{E}}=\frac{S{{S}_{E}}}{dof(S{{S}_{E}})}\,\!</math>


Figure 5.12: The variance-covariance matrix for the data of Table 5.1.


Hypothesis Tests in Multiple Linear Regression
The error sum of squares, <math>S{{S}_{E}}\,\!</math>, is calculated using the following equation:
This section discusses hypothesis tests on the regression coefficients in multiple linear regression. As in the case of simple linear regression, these tests can only be carried out if it can be assumed that the random error terms, , are normally and independently distributed with a mean of zero and variance of .




Three types of hypothesis tests can be carried out for multiple linear regression models:
::<math>S{{S}_{E}}={{y}^{\prime }}(I-H)y\,\!</math>




Test for significance of regression
where <math>y\,\!</math> is the vector of observations, <math>I\,\!</math> is the identity matrix of order <math>n\,\!</math> and <math>H\,\!</math> is the hat matrix. The number of degrees of freedom associated with <math>S{{S}_{E}}\,\!</math>, <math>dof(S{{S}_{E}})\,\!</math>, is <math>n-(k+1)\,\!</math>, where <math>n\,\!</math> is the total number of observations and <math>k\,\!</math> is the number of predictor variables in the model. Knowing <math>S{{S}_{E}}\,\!</math> and <math>dof(S{{S}_{E}})\,\!</math>, the error mean square, <math>M{{S}_{E}}\,\!</math>, can be calculated. The error mean square is an estimate of the variance, <math>{{\sigma }^{2}}\,\!</math>, of the random error terms, <math>{{\epsilon }_{i}}\,\!</math>.  
This test checks the significance of the whole regression model.


t test


This test checks the significance of individual regression coefficients.
::<math>{{\hat{\sigma }}^{2}}=M{{S}_{E}}\,\!</math>


Partial F test  
=====Example=====
The test for the significance of regression, for the regression model obtained for the data in the table (see [[Multiple_Linear_Regression_Analysis#Estimating_Regression_Models_Using_Least_Squares| Estimating Regression Models Using Least Squares]]), is illustrated in this example. The null hypothesis for the model is:


This test can be used to simultaneously check the significance of a number of regression coefficients. It can also be used to test individual coefficients.


Test for Significance of Regression
::<math>{{H}_{0}}: {{\beta }_{1}}={{\beta }_{2}}=0\,\!</math>
The test for significance of regression in the case of multiple linear regression analysis is carried out using the analysis of variance. The test is used to check if a linear statistical relationship exists between the response variable and at least one of the predictor variables. The statements for the hypotheses are:


The test for  is carried out using the following statistic:


where  is the regression mean square and  is the error mean square. If the null hypothesis, , is true then the statistic  follows the  distribution with  degrees of freedom in the numerator and () degrees of freedom in the denominator. [Note] The null hypothesis, , is rejected if the calculated statistic, , is such that:
The statistic to test <math>{{H}_{0}}\,\!</math> is:




Calculation of the Statistic
::<math>{{F}_{0}}=\frac{M{{S}_{R}}}{M{{S}_{E}}}\,\!</math>
To calculate the statistic , the mean squares  and  must be known. As explained in Chapter 4, the mean squares are obtained by dividing the sum of squares by their degrees of freedom. For example, the total mean square, , is obtained as follows:
(14)


where  is the total sum of squares and  is the number of degrees of freedom associated with . In multiple linear regression, the following equation is used to calculate : [Note]
(15)


where  is the total number of observations, is the vector of observations (that was defined in Chapter 5, Estimating Regression Models Using Least Squares),  is the identity matrix of order  and  represents an  square matrix of ones. The number of degrees of freedom associated with , , is (). Knowing  and  the total mean square, , can be calculated.
To calculate <math>{{F}_{0}}\,\!</math>, first the sum of squares are calculated so that the mean squares can be obtained. Then the mean squares are used to calculate the statistic <math>{{F}_{0}}\,\!</math> to carry out the significance test.
The regression sum of squares, <math>S{{S}_{R}}\,\!</math>, can be obtained as:




The regression mean square, , is obtained by dividing the regression sum of squares, , by the respective degrees of freedom, , as follows:
::<math>S{{S}_{R}}={{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y\,\!</math>
(16)


The regression sum of squares, , is calculated using the following equation:
(17)


where  is the total number of observations, is the vector of observations, is the hat matrix (that was defined in Chapter 5, Estimating Regression Models Using Least Squares) and  represents an  square matrix of ones. The number of degrees of freedom associated with , , is , where  is the number of predictor variables in the model. Knowing  and  the regression mean square, , can be calculated.
The hat matrix, <math>H\,\!</math> is calculated as follows using the design matrix <math>X\,\!</math> from the previous [[Multiple_Linear_Regression_Analysis#Example| example]]:




The error mean square, , is obtained by dividing the error sum of squares, , by the respective degrees of freedom, , as follows:
::<math>\begin{align}
(18)
  H & = & X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }} \\
& = & \left[ \begin{matrix}
  0.27552 & 0.25154 & . & . & -0.04030  \\
  0.25154 & 0.23021 & . & . & -0.029120  \\
  . & . & . & . & .  \\
  . & . & . & . & .  \\
  -0.04030 & -0.02920 & . & . & 0.30115  \\
\end{matrix} \right] 
\end{align}\,\!</math>




The error sum of squares, , is calculated using the following equation:
Knowing <math>y\,\!</math>, <math>H\,\!</math> and <math>J\,\!</math>, the regression sum of squares, <math>S{{S}_{R}}\,\!</math>, can be calculated:
(19)


where  is the vector of observations,  is the identity matrix of order  and  is the hat matrix. The number of degrees of freedom associated with , , is , where  is the total number of observations and  is the number of predictor variables in the model. Knowing  and , the error mean square, , can be calculated. The error mean square is an estimate of the variance, , of the random error terms, .


::<math>\begin{align}
  S{{S}_{R}} & = & {{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y \\
& = & 12816.35 
\end{align}\,\!</math>




Example 5.2
The degrees of freedom associated with <math>S{{S}_{R}}\,\!</math> is <math>k\,\!</math>, which equals to a value of two since there are two predictor variables in the data in the table (see [[Multiple_Linear_Regression_Analysis#Estimating_Regression_Models_Using_Least_Squares| Multiple Linear Regression Analysis]]). Therefore, the regression mean square is:




The test for the significance of regression, for the regression model obtained for the data in Table 5.1, is illustrated in this example. The null hypothesis for the model is:
::<math>\begin{align}
  M{{S}_{R}}& = & \frac{S{{S}_{R}}}{dof(S{{S}_{R}})} \\
& = & \frac{12816.35}{2} \\
& = & 6408.17 
\end{align}\,\!</math>


The statistic to test  is:


Similarly to calculate the error mean square, <math>M{{S}_{E}}\,\!</math>, the error sum of squares, <math>S{{S}_{E}}\,\!</math>, can be obtained as:


To calculate , first the sum of squares are calculated so that the mean squares can be obtained. Then the mean squares are used to calculate the statistic  to carry out the significance test.


The regression sum of squares, , can be obtained as:
::<math>\begin{align}
  S{{S}_{E}} &= & {{y}^{\prime }}\left[ I-H \right]y \\
& = & 423.37 
\end{align}\,\!</math>




The hat matrix, is calculated as follows using the design matrix  from Example 5.1:
The degrees of freedom associated with <math>S{{S}_{E}}\,\!</math> is <math>n-(k+1)\,\!</math>. Therefore, the error mean square, <math>M{{S}_{E}}\,\!</math>, is:




   
::<math>\begin{align}
  M{{S}_{E}} &= & \frac{S{{S}_{E}}}{dof(S{{S}_{E}})} \\
  & = & \frac{S{{S}_{E}}}{(n-(k+1))} \\
& = & \frac{423.37}{(17-(2+1))} \\
& = & 30.24 
\end{align}\,\!</math>


Knowing ,  and , the regression sum of squares, , can be calculated:


The statistic to test the significance of regression can now be calculated as:




The degrees of freedom associated with is , which equals to a value of two since there are two predictor variables in the data in Table 5.1. Therefore, the regression mean square is:
::<math>\begin{align}
  {{f}_{0}}& = & \frac{M{{S}_{R}}}{M{{S}_{E}}} \\
  & = & \frac{6408.17}{423.37/(17-3)} \\
& = & 211.9 
\end{align}\,\!</math>




Similarly to calculate the error mean square, , the error sum of squares, , can be obtained as:
The critical value for this test, corresponding to a significance level of 0.1, is:




The degrees of freedom associated with  is . Therefore, the error mean square, , is:
::<math>\begin{align}
  {{f}_{\alpha ,k,n-(k+1)}} &= & {{f}_{0.1,2,14}} \\
& = & 2.726 
\end{align}\,\!</math>




The statistic to test the significance of regression can now be calculated as:
Since <math>{{f}_{0}}>{{f}_{0.1,2,14}}\,\!</math>, <math>{{H}_{0}}:\,\!</math> <math>{{\beta }_{1}}={{\beta }_{2}}=0\,\!</math> is rejected and it is concluded that at least one coefficient out of <math>{{\beta }_{1}}\,\!</math> and <math>{{\beta }_{2}}\,\!</math> is significant. In other words, it is concluded that a regression model exists between yield and either one or both of the factors in the table. The analysis of variance is summarized in the following table.




The critical value for this test, corresponding to a significance level of 0.1, is:
[[Image:doet5.2.png|center|477px|ANOVA table for the significance of regression test.|link=]]


===Test on Individual Regression Coefficients (''t''  Test)===
The <math>t\,\!</math> test is used to check the significance of individual regression coefficients in the multiple linear regression model. Adding a significant variable to a regression model makes the model more effective, while adding an unimportant variable may make the model worse. The hypothesis statements to test the significance of a particular regression coefficient, <math>{{\beta }_{j}}\,\!</math>, are:


Since ,  is rejected and it is concluded that at least one coefficient out of  and  is significant. In other words, it is concluded that a regression model exists between yield and either one or both of the factors in Table 5.1. The analysis of variance is summarized in Table 5.2.


   
::<math>\begin{align}
  & {{H}_{0}}: & {{\beta }_{j}}=0 \\
  & {{H}_{1}}: & {{\beta }_{j}}\ne 0 
\end{align}\,\!</math>




Table 5.2: ANOVA table for the significance of regression test in Example 5.2.
The test statistic for this test is based on the <math>t\,\!</math> distribution (and is similar to the one used in the case of simple linear regression models in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Anaysis]]):




Test on Individual Regression Coefficients ( Test)
::<math>{{T}_{0}}=\frac{{{{\hat{\beta }}}_{j}}}{se({{{\hat{\beta }}}_{j}})}\,\!</math>
The  test is used to check the significance of individual regression coefficients in the multiple linear regression model. Adding a significant variable to a regression model makes the model more effective, while adding an unimportant variable may make the model worse. The hypothesis statements to test the significance of a particular regression coefficient, , are:




The test statistic for this test is based on the distribution (and is similar to the one used in the case of simple linear regression models in Chapter 4):
where the standard error, <math>se({{\hat{\beta }}_{j}})\,\!</math>, is obtained. The analyst would fail to reject the null hypothesis if the test statistic lies in the acceptance region:
(20)


where the standard error, , is obtained from Eqn. (13). The analyst would fail to reject the null hypothesis if the test statistic, calculated using Eqn. (20), lies in the acceptance region:


::<math>-{{t}_{\alpha /2,n-2}}<{{T}_{0}}<{{t}_{\alpha /2,n-2}}\,\!</math>




This test measures the contribution of a variable while the remaining variables are included in the model. For the model , if the test is carried out for , then the test will check the significance of including the variable in the model that contains and (i.e. the model ). Hence the test is also referred to as partial or marginal test. In DOE++, this test is displayed in the Regression Information table.
This test measures the contribution of a variable while the remaining variables are included in the model. For the model <math>\hat{y}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{x}_{1}}+{{\hat{\beta }}_{2}}{{x}_{2}}+{{\hat{\beta }}_{3}}{{x}_{3}}\,\!</math>, if the test is carried out for <math>{{\beta }_{1}}\,\!</math>, then the test will check the significance of including the variable <math>{{x}_{1}}\,\!</math> in the model that contains <math>{{x}_{2}}\,\!</math> and <math>{{x}_{3}}\,\!</math> (i.e., the model <math>\hat{y}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{2}}{{x}_{2}}+{{\hat{\beta }}_{3}}{{x}_{3}}\,\!</math> ). Hence the test is also referred to as partial or marginal test. In DOE folios, this test is displayed in the Regression Information table.


====Example====
The test to check the significance of the estimated regression coefficients for the data is illustrated in this example. The null hypothesis to test the coefficient <math>{{\beta }_{2}}\,\!</math> is:


Example 5.3


::<math>{{H}_{0}}:{{\beta }_{2}}=0\,\!</math>


The test to check the significance of the estimated regression coefficients for the data in Table 5.1 is illustrated in this example. The null hypothesis to test the coefficient  is:


The null hypothesis to test <math>{{\beta }_{1}}\,\!</math> can be obtained in a similar manner. To calculate the test statistic, <math>{{T}_{0}}\,\!</math>, we need to calculate the standard error.
In the [[Multiple_Linear_Regression_Analysis#Example_2|example]], the value of the error mean square, <math>M{{S}_{E}}\,\!</math>, was obtained as 30.24. The error mean square is an estimate of the variance, <math>{{\sigma }^{2}}\,\!</math>.


The null hypothesis to test  can be obtained in a similar manner. To calculate the test statistic, , we need to calculate the standard error using Eqn. (13).


:Therefore:


In Example 5.2, the value of the error mean square, , was obtained as 30.24. The error mean square is an estimate of the variance, . Therefore:


::<math>\begin{align}
  {{{\hat{\sigma }}}^{2}} &= & M{{S}_{E}} \\
& = & 30.24 
\end{align}\,\!</math>




The variance-covariance matrix of the estimated regression coefficients is:
The variance-covariance matrix of the estimated regression coefficients is:




   
::<math>\begin{align}
  C &= & {{{\hat{\sigma }}}^{2}}{{({{X}^{\prime }}X)}^{-1}} \\
  & = & 30.24\left[ \begin{matrix}
  336.5 & 1.2 & -13.1  \\
  1.2 & 0.005 & -0.049  \\
  -13.1 & -0.049 & 0.5  \\
\end{matrix} \right] \\
& = & \left[ \begin{matrix}
  10176.75 & 37.145 & -395.83  \\
  37.145 & 0.1557 & -1.481  \\
  -395.83 & -1.481 & 15.463  \\
\end{matrix} \right] 
\end{align}\,\!</math>
 
 
From the diagonal elements of <math>C\,\!</math>, the estimated standard error for <math>{{\hat{\beta }}_{1}}\,\!</math> and <math>{{\hat{\beta }}_{2}}\,\!</math> is:


From the diagonal elements of , the estimated standard error for  and  is:


::<math>\begin{align}
  se({{{\hat{\beta }}}_{1}}) &= & \sqrt{0.1557}=0.3946 \\
  se({{{\hat{\beta }}}_{2}})& = & \sqrt{15.463}=3.93 
\end{align}\,\!</math>




The corresponding test statistics for these coefficients are:
The corresponding test statistics for these coefficients are:




   
::<math>\begin{align}
  {{({{t}_{0}})}_{{{{\hat{\beta }}}_{1}}}} &= & \frac{{{{\hat{\beta }}}_{1}}}{se({{{\hat{\beta }}}_{1}})}=\frac{1.24}{0.3946}=3.1393 \\
  {{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}} &= & \frac{{{{\hat{\beta }}}_{2}}}{se({{{\hat{\beta }}}_{2}})}=\frac{12.08}{3.93}=3.0726  
\end{align}\,\!</math>
 
 
The critical values for the present <math>t\,\!</math> test at a significance of 0.1 are:
 
 
::<math>\begin{align}
  {{t}_{\alpha /2,n-(k+1)}} &= & {{t}_{0.05,14}}=1.761 \\
  -{{t}_{\alpha /2,n-(k+1)}} & = & -{{t}_{0.05,14}}=-1.761 
\end{align}\,\!</math>
 


The critical values for the present  test at a significance of 0.1 are:
Considering <math>{{\hat{\beta }}_{2}}\,\!</math>, it can be seen that <math>{{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}}\,\!</math> does not lie in the acceptance region of <math>-{{t}_{0.05,14}}<{{t}_{0}}<{{t}_{0.05,14}}\,\!</math>. The null hypothesis, <math>{{H}_{0}}:{{\beta }_{2}}=0\,\!</math>, is rejected and it is concluded that <math>{{\beta }_{2}}\,\!</math> is significant at <math>\alpha =0.1\,\!</math>. This conclusion can also be arrived at using the <math>p\,\!</math> value noting that the hypothesis is two-sided. The <math>p\,\!</math> value corresponding to the test statistic, <math>{{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}} = 3.0726\,\!</math>, based on the <math>t\,\!</math> distribution with 14 degrees of freedom is:




   
::<math>\begin{align}
  p\text{ }value & = & 2\times (1-P(T\le |{{t}_{0}}|) \\
  & = & 2\times (1-0.9959) \\
& = & 0.0083 
\end{align}\,\!</math>


Considering , it can be seen that  does not lie in the acceptance region of . The null hypothesis, , is rejected and it is concluded that  is significant at . This conclusion can also be arrived at using the  value noting that the hypothesis is two-sided. The  value corresponding to the test statistic,  , based on the  distribution with 14 degrees of freedom is:


Since the value is less than the significance, , it is concluded that is significant. The hypothesis test on can be carried out in a similar manner.
Since the <math>p\,\!</math> value is less than the significance, <math>\alpha =0.1\,\!</math>, it is concluded that <math>{{\beta }_{2}}\,\!</math> is significant. The hypothesis test on <math>{{\beta }_{1}}\,\!</math> can be carried out in a similar manner.


As explained in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]], in DOE folios, the information related to the <math>t\,\!</math> test is displayed in the Regression Information table as shown in the figure below.


As explained in Chapter 4, in DOE++, the information related to the  test is displayed in the Regression Information table as shown in Figure 5.13. In this table, the  test for  is displayed in the row for the term Factor 2 because  is the coefficient that represents this factor in the regression model. Columns labeled Standard Error, T Value and P Value represent the standard error, the test statistic for the  test and the  value for the  test, respectively. These values have been calculated for  in this example. The Coefficient column represents the estimate of regression coefficients. These values are calculated using Eqn. (8) as shown in Example 5.1. The Effect column represents values obtained by multiplying the coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in Chapter 7. Columns labeled Low CI and High CI represent the limits of the confidence intervals for the regression coefficients and are explained in Chapter 5, Confidence Interval on Regression Coefficients. The Variance Inflation Factor column displays values that give a measure of multicollinearity. This is explained in Chapter 5, Multicollinearity.


[[Image:doe5_13.png|center|884px|Regression results for the data.|link=]]




Figure 5.13: Regression results for the data in Table 5.1.
In this table, the <math>t\,\!</math> test for <math>{{\beta }_{2}}\,\!</math> is displayed in the row for the term Factor 2 because <math>{{\beta }_{2}}\,\!</math> is the coefficient that represents this factor in the regression model. Columns labeled Standard Error, T Value and P Value represent the standard error, the test statistic for the <math>t\,\!</math> test and the <math>p\,\!</math> value for the <math>t\,\!</math> test, respectively. These values have been calculated for <math>{{\beta }_{2}}\,\!</math> in this example. The Coefficient column represents the estimate of regression coefficients. These values are calculated as shown in [[Multiple_Linear_Regression_Analysis#Example|this]] example. The Effect column represents values obtained by multiplying the coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in [[Two_Level_Factorial_Experiments| Two-Level Factorial Experiments]]. Columns labeled Low Confidence and High Confidence represent the limits of the confidence intervals for the regression coefficients and are explained in [[Multiple_Linear_Regression_Analysis#Confidence_Intervals_in_Multiple_Linear_Regression|Confidence Intervals in Multiple Linear Regression]]. The Variance Inflation Factor column displays values that give a measure of ''multicollinearity''. This is explained in [[Multiple_Linear_Regression_Analysis#Multicollinearity|Multicollinearity]].


Test on Subsets of Regression Coefficients (Partial F Test)  
===Test on Subsets of Regression Coefficients (Partial ''F'' Test)===
This section discusses the partial F test, which can be performed on subsets of regression coefficients. This section also includes the following subsections:


This test can be considered to be the general form of the <math>t\,\!</math> test mentioned in the previous section. This is because the test simultaneously checks the significance of including many (or even one) regression coefficients in the multiple linear regression model. Adding a variable to a model increases the regression sum of squares, <math>S{{S}_{R}}\,\!</math>. The test is based on this increase in the regression sum of squares. The increase in the regression sum of squares is called the ''extra sum of squares''.
Assume that the vector of the regression coefficients, <math>\beta\,\!</math>, for the multiple linear regression model, <math>y=X\beta +\epsilon\,\!</math>, is partitioned into two vectors with the second vector, <math>{{\theta}_{2}}\,\!</math>, containing the last <math>r\,\!</math> regression coefficients, and the first vector, <math>{{\theta}_{1}}\,\!</math>, containing the first ( <math>k+1-r\,\!</math> ) coefficients as follows:


Types of Extra Sum of Squares


Partial Sum of Squares
::<math>\beta =\left[ \begin{matrix}
  {{\theta}_{1}}  \\
  {{\theta}_{2}}  \\
\end{matrix} \right]\,\!</math>


Sequential Sum of Squares


:with:


The partial F test can be considered to be the general form of the  test mentioned in the previous section. This is because the test simultaneously checks the significance of including many (or even one) regression coefficients in the multiple linear regression model. Adding a variable to a model increases the regression sum of squares, . The test is based on this increase in the regression sum of squares. The increase in the regression sum of squares is called the extra sum of squares. [Note]


::<math>{{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}}...{{\beta }_{k-r}}{]}'\text{ and }{{\theta}_{2}}=[{{\beta }_{k-r+1}},{{\beta }_{k-r+2}}...{{\beta }_{k}}{]}'\text{    }\,\!</math>


Assume that the vector of the regression coefficients, , for the multiple linear regression model, , is partitioned into two vectors with the second vector, , containing the last  regression coefficients, and the first vector, , containing the first () coefficients as follows:


The hypothesis statements to test the significance of adding the regression coefficients in <math>{{\theta}_{2}}\,\!</math> to a model containing the regression coefficients in <math>{{\theta}_{1}}\,\!</math> may be written as:


with:


::<math>\begin{align}
  & {{H}_{0}}: & {{\theta}_{2}}=0 \\
& {{H}_{1}}: & {{\theta}_{2}}\ne 0 
\end{align}\,\!</math>


The hypothesis statements to test the significance of adding the regression coefficients in  to a model containing the regression coefficients in  may be written as:


The test statistic for this test follows the <math>F\,\!</math> distribution and can be calculated as follows:


The test statistic for this test follows the  distribution and can be calculated as follows:
(21)


where  is the increase in the regression sum of squares when the variables corresponding to the coefficients in  are added to a model already containing , and  is obtained from Eqn. (18). The value of the extra sum of squares is obtained as explained in the next section.
::<math>{{F}_{0}}=\frac{S{{S}_{R}}({{\theta}_{2}}|{{\theta}_{1}})/r}{M{{S}_{E}}}\,\!</math>




The null hypothesis, , is rejected if . Rejection of  leads to the conclusion that at least one of the variables in , ... contributes significantly to the regression model. [Note] In DOE++, the results from the partial  test are displayed in the ANOVA table.
where <math>S{{S}_{R}}({{\theta}_{2}}|{{\theta}_{1}})\,\!</math> is the the increase in the regression sum of squares when the variables corresponding to the coefficients in <math>{{\theta}_{2}}\,\!</math> are added to a model already containing <math>{{\theta}_{1}}\,\!</math>, and <math>M{{S}_{E}}\,\!</math> is obtained from the equation given in [[Simple_Linear_Regression_Analysis#Mean_Squares|Simple Linear Regression Analysis]]. The value of the extra sum of squares is obtained as explained in the next section.


Types of Extra Sum of Squares
The null hypothesis, <math>{{H}_{0}}\,\!</math>, is rejected if <math>{{F}_{0}}>{{f}_{\alpha ,r,n-(k+1)}}\,\!</math>. Rejection of <math>{{H}_{0}}\,\!</math> leads to the conclusion that at least one of the variables in <math>{{x}_{k-r+1}}\,\!</math>, <math>{{x}_{k-r+2}}\,\!</math>... <math>{{x}_{k}}\,\!</math> contributes significantly to the regression model. In a DOE folio, the results from the partial <math>F\,\!</math> test are displayed in the ANOVA table.
The extra sum of squares can be calculated using either the partial (or adjusted) sum of squares or the sequential sum of squares. The type of extra sum of squares used affects the calculation of the test statistic of Eqn. (21). In DOE++, selection for the type of extra sum of squares is available in the Options tab of the Control Panel as shown in Figure 5.14. The partial sum of squares is used as the default setting. The reason for this is explained in the following section on the partial sum of squares.


[[Image:doe5_14.png|center|650px|ANOVA Table for Extra Sum of Squares in Weibull++.]]


===Types of Extra Sum of Squares===
The extra sum of squares can be calculated using either the partial (or adjusted) sum of squares or the sequential sum of squares. The type of extra sum of squares used affects the calculation of the test statistic for the partial <math>F\,\!</math> test described above. In DOE folios, selection for the type of extra sum of squares is available as shown in the figure below. The partial sum of squares is used as the default setting. The reason for this is explained in the following section on the partial sum of squares. 


Figure 5.14: Selection of the type of extra sum of squares in DOE++.


 
====Partial Sum of Squares====
Partial Sum of Squares
The partial sum of squares for a term is the extra sum of squares when all terms, except the term under consideration, are included in the model. For example, consider the model:
The partial sum of squares for a term is the extra sum of squares when all terms, except the term under consideration, are included in the model. For example, consider the model:
(22)




Assume that we need to know the partial sum of squares for . The partial sum of squares for is the increase in the regression sum of squares when is added to the model. This increase is the difference in the regression sum of squares for the full model of Eqn. (22) and the model that includes all terms except . These terms are ,  and . The model that contains these terms is:
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\!</math>
(23)
 
 
The sum of squares of regression of this model is denoted by <math>S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}})\,\!</math>. Assume that we need to know the partial sum of squares for <math>{{\beta }_{2}}\,\!</math>. The partial sum of squares for <math>{{\beta }_{2}}\,\!</math> is the increase in the regression sum of squares when <math>{{\beta }_{2}}\,\!</math> is added to the model. This increase is the difference in the regression sum of squares for the full model of the equation given above and the model that includes all terms except <math>{{\beta }_{2}}\,\!</math>. These terms are <math>{{\beta }_{0}}\,\!</math>, <math>{{\beta }_{1}}\,\!</math> and <math>{{\beta }_{12}}\,\!</math>. The model that contains these terms is:




The partial sum of squares for  can be represented as  and is calculated as follows:
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\!</math>




The sum of squares of regression of this model is denoted by <math>S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})\,\!</math>. The partial sum of squares for <math>{{\beta }_{2}}\,\!</math>can be represented as <math>S{{S}_{R}}({{\beta }_{2}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})\,\!</math> and is calculated as follows:


For the present case,  and . It can be noted that for the partial sum of squares  contains all coefficients other than the coefficient being tested.


   
::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{2}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})& = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}})-S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})  
\end{align}\,\!</math>


DOE++ has the partial sum of squares as the default selection. This is because the  test explained in Chapter 5, Test on Individual Regression Coefficients, is a partial test, i.e. the  test on an individual coefficient is carried by assuming that all the remaining coefficients are included in the model (similar to the way the partial sum of squares is calculated). The results from the  test are displayed in the Regression Information table. The results from the partial  test are displayed in the ANOVA table. To keep the results in the two tables consistent with each other, the partial sum of squares is used as the default selection for the results displayed in the ANOVA table.


For the present case, <math>{{\theta}_{2}}=[{{\beta }_{2}}{]}'\,\!</math> and <math>{{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}}{]}'\,\!</math>. It can be noted that for the partial sum of squares <math>{{\beta }_{1}}\,\!</math> contains all coefficients other than the coefficient being tested.


A Weibull++ DOE folio has the partial sum of squares as the default selection. This is because the <math>t\,\!</math> test is a partial test, i.e., the <math>t\,\!</math> test on an individual coefficient is carried by assuming that all the remaining coefficients are included in the model (similar to the way the partial sum of squares is calculated). The results from the <math>t\,\!</math> test are displayed in the Regression Information table. The results from the partial <math>F\,\!</math> test are displayed in the ANOVA table. To keep the results in the two tables consistent with each other, the partial sum of squares is used as the default selection for the results displayed in the ANOVA table.
The partial sum of squares for all terms of a model may not add up to the regression sum of squares for the full model when the regression coefficients are correlated. If it is preferred that the extra sum of squares for all terms in the model always add up to the regression sum of squares for the full model then the sequential sum of squares should be used.
The partial sum of squares for all terms of a model may not add up to the regression sum of squares for the full model when the regression coefficients are correlated. If it is preferred that the extra sum of squares for all terms in the model always add up to the regression sum of squares for the full model then the sequential sum of squares should be used.


=====Example=====
This example illustrates the <math>F\,\!</math> test using the partial sum of squares. The test is conducted for the coefficient <math>{{\beta }_{1}}\,\!</math> corresponding to the predictor variable <math>{{x}_{1}}\,\!</math> for the data. The regression model used for this data set in the [[Multiple_Linear_Regression_Analysis#Example| example]] is:
 
 
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>
 
 
The null hypothesis to test the significance of <math>{{\beta }_{1}}\,\!</math> is:


Example 5.4


::<math>{{H}_{0}}: {{\beta }_{1}}=0\,\!</math>


This example illustrates the partial  test using the partial sum of squares. The test is conducted for the coefficient  corresponding to the predictor variable  for the data in Table 5.1.


The statistic to test this hypothesis is:


The regression model used for this data set in Example 5.1 is:


::<math>{{F}_{0}}=\frac{S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})/r}{M{{S}_{E}}}\,\!</math>




The null hypothesis to test the significance of is:
where <math>S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})\,\!</math> represents the partial sum of squares for <math>{{\beta }_{1}}\,\!</math>, <math>r\,\!</math> represents the number of degrees of freedom for <math>S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})\,\!</math> (which is one because there is just one coefficient, <math>{{\beta }_{1}}\,\!</math>, being tested) and <math>M{{S}_{E}}\,\!</math> is the error mean square and has been calculated in the second [[Multiple_Linear_Regression_Analysis#Example_2|example]] as 30.24.


The partial sum of squares for <math>{{\beta }_{1}}\,\!</math> is the difference between the regression sum of squares for the full model, <math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>, and the regression sum of squares for the model excluding <math>{{\beta }_{1}}\,\!</math>, <math>Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>. The regression sum of squares for the full model has been calculated in the second [[Multiple_Linear_Regression_Analysis#Example_2|example]] as 12816.35. Therefore:




The statistic to test this hypothesis is:
::<math>S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}})=12816.35\,\!</math>




where  represents the partial sum of squares for , represents the number of degrees of freedom for (which is one because there is just one coefficient, , being tested) and  is the error mean square that can be obtained using Eqn. (18) and has been calculated in Example 5.2 as 30.24. [Note]
The regression sum of squares for the model <math>Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math> is obtained as shown next. First the design matrix for this model, <math>{{X}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\!</math>, is obtained by dropping the second column in the design matrix of the full model, <math>X\,\!</math> (the full design matrix, <math>X\,\!</math>, was obtained in the [[Multiple_Linear_Regression_Analysis#Example| example]]). The second column of <math>X\,\!</math> corresponds to the coefficient <math>{{\beta }_{1}}\,\!</math> which is no longer in the model. Therefore, the design matrix for the model, <math>Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>, is:




The partial sum of squares for is the difference between the regression sum of squares for the full model, , and the regression sum of squares for the model excluding , . The regression sum of squares for the full model can be obtained using Eqn. (31) and has been calculated in Example 5.2 as . Therefore:
::<math>{{X}_{{{\beta }_{0}},{{\beta }_{2}}}}=\left[ \begin{matrix}
  1 & 29.1 \\
  1 & 29.3  \\
  . & . \\
  . & . \\
  1 & 32.9  \\
\end{matrix} \right]\,\!</math>




The hat matrix corresponding to this design matrix is <math>{{H}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\!</math>. It can be calculated using <math>{{H}_{{{\beta }_{0}},{{\beta }_{2}}}}={{X}_{{{\beta }_{0}},{{\beta }_{2}}}}{{(X_{{{\beta }_{0}},{{\beta }_{2}}}^{\prime }{{X}_{{{\beta }_{0}},{{\beta }_{2}}}})}^{-1}}X_{{{\beta }_{0}},{{\beta }_{2}}}^{\prime }\,\!</math>. Once <math>{{H}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\!</math> is known, the regression sum of squares for the model <math>Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>, can be calculated as:
   
   


The regression sum of squares for the model  is obtained as shown next. First the design matrix for this model, , is obtained by dropping the second column in the design matrix of the full model,  (the full design matrix, , was obtained in Example 5.1). The second column of  corresponds to the coefficient which is no longer in the model. Therefore, the design matrix for the model, , is:
::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{0}},{{\beta }_{2}}) & = & {{y}^{\prime }}\left[ {{H}_{{{\beta }_{0}},{{\beta }_{2}}}}-(\frac{1}{n})J \right]y \\
& = & 12518.32  
\end{align}\,\!</math>




Therefore, the partial sum of squares for <math>{{\beta }_{1}}\,\!</math> is:


The hat matrix corresponding to this design matrix is . It can be calculated using . Once  is known, the regression sum of squares for the model , can be calculated using Eqn. (17) as:


::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})& = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}})-S{{S}_{R}}({{\beta }_{0}},{{\beta }_{2}}) \\
& = & 12816.35-12518.32 \\
& = & 298.03 
\end{align}\,\!</math>


Therefore, the partial sum of squares for  is:


Knowing the partial sum of squares, the statistic to test the significance of <math>{{\beta }_{1}}\,\!</math> is:




Knowing the partial sum of squares, the statistic to test the significance of  is:
::<math>\begin{align}
  {{f}_{0}} &= & \frac{S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})/r}{M{{S}_{E}}} \\
& = & \frac{298.03/1}{30.24} \\
& = & 9.855 
\end{align}\,\!</math>
 


The <math>p\,\!</math> value corresponding to this statistic based on the <math>F\,\!</math> distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is:


   
::<math>\begin{align}
  p\text{ }value &= & 1-P(F\le {{f}_{0}}) \\
  & = & 1-0.9928 \\
& = & 0.0072 
\end{align}\,\!</math>


The  value corresponding to this statistic based on the  distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is: [Note_8]


Assuming that the desired significance is 0.1, since <math>p\,\!</math> value < 0.1, <math>{{H}_{0}}:{{\beta }_{1}}=0\,\!</math> is rejected and it can be concluded that <math>{{\beta }_{1}}\,\!</math> is significant. The test for <math>{{\beta }_{2}}\,\!</math> can be carried out in a similar manner. In the results obtained from the DOE folio, the calculations for this test are displayed in the ANOVA table as shown in the following figure. Note that the conclusion obtained in this example can also be obtained using the <math>t\,\!</math> test as explained in the [[Multiple_Linear_Regression_Analysis#Example_3|example]] in [[Multiple_Linear_Regression_Analysis#Test_on_Individual_Regression_Coefficients_.28t__Test.29|Test on Individual Regression Coefficients (t Test)]]. The ANOVA and Regression Information tables in the DOE folio represent two different ways to test for the significance of the variables included in the multiple linear regression model.


Assuming that the desired significance is 0.1, since  value < 0.1,  is rejected and it can be concluded that  is significant. The test for can be carried out in a similar manner. In the results obtained from DOE++, the calculations for this test are displayed in the ANOVA table as shown in Figure 5.15. Note that the conclusion obtained in this example can also be obtained using the  test as explained in Example 5.3 in Chapter 5, Test on Individual Regression Coefficients. The ANOVA and Regression Information tables in DOE++ represent two different ways to test for the significance of the variables included in the multiple linear regression model.
====Sequential Sum of Squares====
The sequential sum of squares for a coefficient is the extra sum of squares when coefficients are added to the model in a sequence. For example, consider the model:




 
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}}+{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}}+\epsilon\,\!</math>


Figure 5.15: ANOVA results for the data in Table 5.1.


The sequential sum of squares for <math>{{\beta }_{13}}\,\!</math> is the increase in the sum of squares when <math>{{\beta }_{13}}\,\!</math> is added to the model observing the sequence of the equation given above. Therefore this extra sum of squares can be obtained by taking the difference between the regression sum of squares for the model after <math>{{\beta }_{13}}\,\!</math> was added and the regression sum of squares for the model before <math>{{\beta }_{13}}\,\!</math> was added to the model. The model after <math>{{\beta }_{13}}\,\!</math> is added is as follows:


 
Sequential Sum of Squares
The sequential sum of squares for a coefficient is the extra sum of squares when coefficients are added to the model in a sequence. For example, consider the model:
(24)


::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+\epsilon\,\!</math>


The sequential sum of squares for  is the increase in the sum of squares when  is added to the model observing the sequence of Eqn. (24). Therefore this extra sum of squares can be obtained by taking the difference between the regression sum of squares for the model after  was added and the regression sum of squares for the model before  was added to the model. The model after  is added is as follows:
(25)


This is because to maintain the sequence all coefficients preceding <math>{{\beta }_{13}}\,\!</math> must be included in the model. These are the coefficients <math>{{\beta }_{0}}\,\!</math>, <math>{{\beta }_{1}}\,\!</math>, <math>{{\beta }_{2}}\,\!</math>, <math>{{\beta }_{12}}\,\!</math> and <math>{{\beta }_{3}}\,\!</math>.
Similarly the model before <math>{{\beta }_{13}}\,\!</math> is added must contain all coefficients of the equation given above except <math>{{\beta }_{13}}\,\!</math>. This model can be obtained as follows:


This is because to maintain the sequence of Eqn. (24) all coefficients preceding  must be included in the model. These are the coefficients , , ,  and .


Similarly the model before  is added must contain all coefficients of Eqn. (25) except . This model can be obtained as follows:
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+\epsilon\,\!</math>
(26)




The sequential sum of squares for can be calculated as follows:
The sequential sum of squares for <math>{{\beta }_{13}}\,\!</math> can be calculated as follows:




   
::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{13}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}) & = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}},{{\beta }_{13}})- S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}) 
\end{align}\,\!</math>


For the present case,  and . It can be noted that for the sequential sum of squares  contains all coefficients proceeding the coefficient being tested.


For the present case, <math>{{\theta}_{2}}=[{{\beta }_{13}}{]}'\,\!</math> and <math>{{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}{]}'\,\!</math>. It can be noted that for the sequential sum of squares <math>{{\beta }_{1}}\,\!</math> contains all coefficients proceeding the coefficient being tested.


The sequential sum of squares for all terms will add up to the regression sum of squares for the full model, but the sequential sum of squares are order dependent.
The sequential sum of squares for all terms will add up to the regression sum of squares for the full model, but the sequential sum of squares are order dependent.


=====Example=====
This example illustrates the partial <math>F\,\!</math> test using the sequential sum of squares. The test is conducted for the coefficient <math>{{\beta }_{1}}\,\!</math> corresponding to the predictor variable <math>{{x}_{1}}\,\!</math> for the data. The regression model used for this data set in the [[Multiple_Linear_Regression_Analysis#Example|example]] is:
 
 
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon \,\!</math>
 
 
The null hypothesis to test the significance of <math>{{\beta }_{1}}\,\!</math> is:
 
 
::<math>{{H}_{0}}:{{\beta }_{1}}=0\,\!</math>
 
 
The statistic to test this hypothesis is:
 
 
::<math>{{F}_{0}}=\frac{S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})/r}{M{{S}_{E}}}\,\!</math>
 


Example 5.5
where <math>S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})\,\!</math> represents the sequential sum of squares for <math>{{\beta }_{1}}\,\!</math>, <math>r\,\!</math> represents the number of degrees of freedom for <math>S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})\,\!</math> (which is one because there is just one coefficient, <math>{{\beta }_{1}}\,\!</math>, being tested) and <math>M{{S}_{E}}\,\!</math> is the error mean square and has been calculated in the second [[Multiple_Linear_Regression_Analysis#Example_2|example]] as 30.24.  


The sequential sum of squares for <math>{{\beta }_{1}}\,\!</math> is the difference between the regression sum of squares for the model after adding <math>{{\beta }_{1}}\,\!</math>, <math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\!</math>, and the regression sum of squares for the model before adding <math>{{\beta }_{1}}\,\!</math>, <math>Y={{\beta }_{0}}+\epsilon\,\!</math>.
The regression sum of squares for the model <math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\!</math> is obtained as shown next. First the design matrix for this model, <math>{{X}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\!</math>, is obtained by dropping the third column in the design matrix for the full model, <math>X\,\!</math> (the full design matrix, <math>X\,\!</math>, was obtained in the [[Multiple_Linear_Regression_Analysis#Example|example]]). The third column of <math>X\,\!</math> corresponds to coefficient <math>{{\beta }_{2}}\,\!</math> which is no longer used in the present model. Therefore, the design matrix for the model, <math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\!</math>, is:


This example illustrates the partial  test using the sequential sum of squares. The test is conducted for the coefficient  corresponding to the predictor variable  for the data in Table 5.1. The regression model used for this data set in Example 5.1 is:


::<math>{{X}_{{{\beta }_{0}},{{\beta }_{1}}}}=\left[ \begin{matrix}
  1 & 41.9  \\
  1 & 43.4  \\
  . & .  \\
  . & .  \\
  1 & 77.8  \\
\end{matrix} \right]\,\!</math>




The null hypothesis to test the significance of is:
The hat matrix corresponding to this design matrix is <math>{{H}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\!</math>. It can be calculated using <math>{{H}_{{{\beta }_{0}},{{\beta }_{1}}}}={{X}_{{{\beta }_{0}},{{\beta }_{1}}}}{{(X_{{{\beta }_{0}},{{\beta }_{1}}}^{\prime }{{X}_{{{\beta }_{0}},{{\beta }_{1}}}})}^{-1}}X_{{{\beta }_{0}},{{\beta }_{1}}}^{\prime }\,\!</math>. Once <math>{{H}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\!</math> is known, the regression sum of squares for the model <math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\!</math> can be calculated as:




   
::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})& = & {{y}^{\prime }}\left[ {{H}_{{{\beta }_{0}},{{\beta }_{1}}}}-(\frac{1}{n})J \right]y \\
  & = & 12530.85 
\end{align}\,\!</math>


The statistic to test this hypothesis is:


[[Image:doe5_16.png|center|650px|Sequential sum of squares for the data.]]


where  represents the sequential sum of squares for ,  represents the number of degrees of freedom for  (which is one because there is just one coefficient, , being tested) and  is the error mean square that can obtained using Eqn. (18) and has been calculated in Example 5.2 as 30.24. [Note]


The regression sum of squares for the model <math>Y={{\beta }_{0}}+\epsilon\,\!</math> is equal to zero since this model does not contain any variables. Therefore:


The sequential sum of squares for  is the difference between the regression sum of squares for the model after adding , , and the regression sum of squares for the model before adding , .


::<math>S{{S}_{R}}({{\beta }_{0}})=0\,\!</math>


The regression sum of squares for the model  is obtained as shown next. First the design matrix for this model, , is obtained by dropping the third column in the design matrix for the full model,  (the full design matrix, , was obtained in Example 5.1). The third column of  corresponds to coefficient  which is no longer used in the present model. Therefore, the design matrix for the model, , is:


The sequential sum of squares for <math>{{\beta }_{1}}\,\!</math> is:




The hat matrix corresponding to this design matrix is . It can be calculated using . Once is known, the regression sum of squares for the model  can be calculated using Eqn. (17) as:
::<math>\begin{align}
  S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{0}}) &= & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})-S{{S}_{R}}({{\beta }_{0}}) \\
& = & 12530.85-0 \\
& = & 12530.85  
\end{align}\,\!</math>




The regression sum of squares for the model  is equal to zero since this model does not contain any variables. Therefore:
Knowing the sequential sum of squares, the statistic to test the significance of <math>{{\beta }_{1}}\,\!</math> is:




The sequential sum of squares for is:
::<math>\begin{align}
  {{f}_{0}} &= & \frac{S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})/r}{M{{S}_{E}}} \\
  & = & \frac{12530.85/1}{30.24} \\
& = & 414.366 
\end{align}\,\!</math>




Knowing the sequential sum of squares, the statistic to test the significance of  is:
The <math>p\,\!</math> value corresponding to this statistic based on the <math>F\,\!</math> distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is:  




::<math>\begin{align}
  p\text{ }value &= & 1-P(F\le {{f}_{0}}) \\
& = & 1-0.999999 \\
& = & 8.46\times {{10}^{-12}} 
\end{align}\,\!</math>
     
   
   
Assuming that the desired significance is 0.1, since <math>p\,\!</math> value < 0.1, <math>{{H}_{0}}:{{\beta }_{1}}=0\,\!</math> is rejected and it can be concluded that <math>{{\beta }_{1}}\,\!</math> is significant. The test for <math>{{\beta }_{2}}\,\!</math> can be carried out in a similar manner. This result is shown in the following figure.
==Confidence Intervals in Multiple Linear Regression==


The  value corresponding to this statistic based on the  distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is: [Note]  
Calculation of confidence intervals for multiple linear regression models are similar to those for simple linear regression models explained in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]].


===Confidence Interval on Regression Coefficients===
A 100 (<math>1-\alpha\,\!</math>) percent confidence interval on the regression coefficient, <math>{{\beta }_{j}}\,\!</math>, is obtained as follows:
::<math>{{\hat{\beta }}_{j}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{C}_{jj}}}\,\!</math>
The confidence interval on the regression coefficients are displayed in the Regression Information table under the Low Confidence and High Confidence columns as shown in the following figure.


Assuming that the desired significance is 0.1, since  value < 0.1,  is rejected and it can be concluded that  is significant. The test for  can be carried out in a similar manner. This result is shown in Figure 5.16.




[[Image:doe5_17.png|center|710px|Confidence interval for the fitted value corresponding to the fifth observation.|link=]]


Figure 5.16: Sequential sum of squares for the data in Table 5.1.


Confidence Intervals in Multiple Linear Regression
Confidence Interval on Fitted Values, <math>{{\hat{y}}_{i}}\,\!</math>
Calculation of confidence intervals for multiple linear regression models are similar to those for simple linear regression models explained in Chapter 4, Simple Linear Regression Analysis.
A 100 (<math>1-\alpha\,\!</math>) percent confidence interval on any fitted value, <math>{{\hat{y}}_{i}}\,\!</math>, is given by:


Confidence Interval on Regression Coefficients
A 100() percent confidence interval on the regression coefficient, , is obtained as follows:
(27)


The confidence interval on the regression coefficients are displayed in the Regression Information table under the Low CI and High CI columns as shown in Figure 5.13.
::<math>{{\hat{y}}_{i}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{{\hat{\sigma }}}^{2}}x_{i}^{\prime }{{({{X}^{\prime }}X)}^{-1}}{{x}_{i}}}\,\!</math>


Confidence Interval on Fitted Values,
A 100() percent confidence interval on any fitted value, , is given by:
(28)


where:  
where:  




   
::<math>{{x}_{i}}=\left[ \begin{matrix}
  1 \\
  {{x}_{i1}}  \\
  .  \\
  .  \\
  .  \\
  {{x}_{ik}}  \\
\end{matrix} \right]\,\!</math>


In Example 5.1 (Chapter 5, Estimating Regression Models Using Least Squares), the fitted value corresponding to the fifth observation was calculated as . The 90% confidence interval on this value can be obtained as shown in Figure 5.17. The values of 47.3 and 29.9 used in the figure are the values of the predictor variables corresponding to the fifth observation in Table 5.1.


In the above [[Multiple_Linear_Regression_Analysis#Example| example]], the fitted value corresponding to the fifth observation was calculated as <math>{{\hat{y}}_{5}}=266.3\,\!</math>. The 90% confidence interval on this value can be obtained as shown in the figure below. The values of 47.3 and 29.9 used in the figure are the values of the predictor variables corresponding to the fifth observation the [[Multiple_Linear_Regression_Analysis#Example|table]].


===Confidence Interval on New Observations===
As explained in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]], the confidence interval on a new observation is also referred to as the prediction interval. The prediction interval takes into account both the error from the fitted model and the error associated with future observations. A 100 (<math>1-\alpha\,\!</math>) percent confidence interval on a new observation, <math>{{\hat{y}}_{p}}\,\!</math>, is obtained as follows:


Figure 5.17: Confidence interval for the fitted value corresponding to the fifth observation in Table 5.1.


 
::<math>{{\hat{y}}_{p}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{{\hat{\sigma }}}^{2}}(1+x_{p}^{\prime }{{({{X}^{\prime }}X)}^{-1}}{{x}_{p}})}\,\!</math>
Confidence Interval on New Observations
As explained in Chapter 4, Simple Linear Regression Analysis, the confidence interval on a new observation is also referred to as the prediction interval. The prediction interval takes into account both the error from the fitted model and the error associated with future observations. A 100() percent confidence interval on a new observation, , is obtained as follows:




Line 859: Line 915:




,..., are the levels of the predictor variables at which the new observation, , needs to be obtained.
::<math>{{x}_{p}}=\left[ \begin{matrix}
  1  \\
  {{x}_{p1}}  \\
  .  \\
  .  \\
  .  \\
  {{x}_{pk}}  \\
\end{matrix} \right]\,\!</math>
 
 
<math>{{x}_{p1}}\,\!</math>,..., <math>{{x}_{pk}}\,\!</math> are the levels of the predictor variables at which the new observation, <math>{{\hat{y}}_{p}}\,\!</math>, needs to be obtained.




In multiple linear regression, prediction intervals should only be obtained at the levels of the predictor variables where the regression model applies. In the case of multiple linear regression it is easy to miss this. Having values lying within the range of the predictor variables does not necessarily mean that the new observation lies in the region to which the model is applicable. For example, consider Figure 5.18 where the shaded area shows the region to which a two variable regression model is applicable. The point corresponding to th level of first predictor variable, , and th level of the second predictor variable, , does not lie in the shaded area, although both of these levels are within the range of the first and second predictor variables respectively. In this case, the regression model is not applicable at this point.
In multiple linear regression, prediction intervals should only be obtained at the levels of the predictor variables where the regression model applies. In the case of multiple linear regression it is easy to miss this. Having values lying within the range of the predictor variables does not necessarily mean that the new observation lies in the region to which the model is applicable. For example, consider the next figure where the shaded area shows the region to which a two variable regression model is applicable. The point corresponding to <math>p\,\!</math> th level of first predictor variable, <math>{{x}_{1}}\,\!</math>, and <math>p\,\!</math> th level of the second predictor variable, <math>{{x}_{2}}\,\!</math>, does not lie in the shaded area, although both of these levels are within the range of the first and second predictor variables respectively. In this case, the regression model is not applicable at this point.




Figure 5.18: Predicted values and region of model application in multiple linear regression.
[[Image:doe5.18.png|center|519px|Predicted values and region of model application in multiple linear regression.|link=]]


Measures of Model Adequacy
==Measures of Model Adequacy==
As in the case of simple linear regression, analysis of a fitted multiple linear regression model is important before inferences based on the model are undertaken. This section presents some techniques that can be used to check the appropriateness of the multiple linear regression model.
As in the case of simple linear regression, analysis of a fitted multiple linear regression model is important before inferences based on the model are undertaken. This section presents some techniques that can be used to check the appropriateness of the multiple linear regression model.


   
===Coefficient of Multiple Determination, ''R''<sup>2</sup>===
The coefficient of multiple determination is similar to the coefficient of determination used in the case of simple linear regression. It is defined as:
 
 
::<math>\begin{align}
  {{R}^{2}} & = & \frac{S{{S}_{R}}}{S{{S}_{T}}} \\
  & = & 1-\frac{S{{S}_{E}}}{S{{S}_{T}}} 
\end{align}\,\!</math>
 
 
<math>{{R}^{2}}\,\!</math> indicates the amount of total variability explained by the regression model. The positive square root of <math>{{R}^{2}}\,\!</math> is called the multiple correlation coefficient and measures the linear association between <math>Y\,\!</math> and the predictor variables, <math>{{x}_{1}}\,\!</math>, <math>{{x}_{2}}\,\!</math>... <math>{{x}_{k}}\,\!</math>.
 
The value of <math>{{R}^{2}}\,\!</math> increases as more terms are added to the model, even if the new term does not contribute significantly to the model. An increase in the value of <math>{{R}^{2}}\,\!</math> cannot be taken as a sign to conclude that the new model is superior to the older model. A better statistic to use is the adjusted <math>{{R}^{2}}\,\!</math> statistic defined as follows:


This section is divided into the following subsections:


   
::<math>\begin{align}
  R_{adj}^{2} &= & 1-\frac{M{{S}_{E}}}{M{{S}_{T}}} \\
  & = & 1-\frac{S{{S}_{E}}/(n-(k+1))}{S{{S}_{T}}/(n-1)} \\
& = & 1-(\frac{n-1}{n-(k+1)})(1-{{R}^{2}}) 
\end{align}\,\!</math>


Coefficient of Multiple Determination


Residual Analysis
The adjusted <math>{{R}^{2}}\,\!</math> only increases when significant terms are added to the model. Addition of unimportant terms may lead to a decrease in the value of <math>R_{adj}^{2}\,\!</math>.


Outlying x Observations
In a DOE folio, <math>{{R}^{2}}\,\!</math> and <math>R_{adj}^{2}\,\!</math> values are displayed as R-sq and R-sq(adj), respectively. Other values displayed along with these values are S, PRESS and R-sq(pred). As explained in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]], the value of S is the square root of the error mean square, <math>M{{S}_{E}}\,\!</math>, and represents the "standard error of the model."


Influential Observations Detection
PRESS is an abbreviation for prediction error sum of squares. It is the error sum of squares calculated using the PRESS residuals in place of the residuals, <math>{{e}_{i}}\,\!</math>, in the equation for the error sum of squares. The PRESS residual, <math>{{e}_{(i)}}\,\!</math>, for a particular observation, <math>{{y}_{i}}\,\!</math>, is obtained by fitting the regression model to the remaining observations. Then the value for a new observation, <math>{{\hat{y}}_{p}}\,\!</math>, corresponding to the observation in question, <math>{{y}_{i}}\,\!</math>, is obtained based on the new regression model. The difference between <math>{{y}_{i}}\,\!</math> and <math>{{\hat{y}}_{p}}\,\!</math> gives <math>{{e}_{(i)}}\,\!</math>. The PRESS residual, <math>{{e}_{(i)}}\,\!</math>, can also be obtained using <math>{{h}_{ii}}\,\!</math>, the diagonal element of the hat matrix, <math>H\,\!</math>, as follows:


Lack-of-Fit Test


Coefficient of Multiple Determination,
::<math>{{e}_{(i)}}=\frac{{{e}_{i}}}{1-{{h}_{ii}}}\,\!</math>
The coefficient of multiple determination is similar to the coefficient of determination used in the case of simple linear regression. It is defined as:
(30)


indicates the amount of total variability explained by the regression model. The positive square root of  is called the multiple correlation coefficient and measures the linear association between  and the predictor variables, , ....




The value of  increases as more terms are added to the model, even if the new term does not contribute significantly to the model. An increase in the value of  cannot be taken as a sign to conclude that the new model is superior to the older model. A better statistic to use is the adjusted  statistic defined as follows:
R-sq(pred), also referred to as prediction <math>{{R}^{2}}\,\!</math>, is obtained using PRESS as shown next:
(31)




The adjusted  only increases when significant terms are added to the model. Addition of unimportant terms may lead to a decrease in the value of .
::<math>R_{pred}^{2}=1-\frac{PRESS}{S{{S}_{T}}}\,\!</math>




In DOE++, and  values are displayed as R-sq and R-sq(adj), respectively. Other values displayed along with these values are S, PRESS and R-sq(pred). As explained in Chapter 4, the value of S is the square root of the error mean square, , and represents the "standard error of the model."
The values of R-sq, R-sq(adj) and S are indicators of how well the regression model fits the observed data. The values of PRESS and R-sq(pred) are indicators of how well the regression model predicts new observations. For example, higher values of PRESS or lower values of R-sq(pred) indicate a model that predicts poorly. The figure below shows these values for the data. The values indicate that the regression model fits the data well and also predicts well.


[[Image:doe5_19.png|center|650px|Coefficient of multiple determination and related results for the data.]]


PRESS is an abbreviation for prediction error sum of squares. It is the error sum of squares calculated using the PRESS residuals in place of the residuals, , in Eqn. (19). The PRESS residual, , for a particular observation, , is obtained by fitting the regression model to the remaining observations. Then the value for a new observation, , corresponding to the observation in question, , is obtained based on the new regression model. The difference between  and gives . The PRESS residual, , can also be obtained using , the diagonal element of the hat matrix, , as follows:
===Residual Analysis===
(32)
Plots of residuals, <math>{{e}_{i}}\,\!</math>, similar to the ones discussed in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]] for simple linear regression, are used to check the adequacy of a fitted multiple linear regression model. The residuals are expected to be normally distributed with a mean of zero and a constant variance of <math>{{\sigma }^{2}}\,\!</math>. In addition, they should not show any patterns or trends when plotted against any variable or in a time or run-order sequence. Residual plots may also be obtained using standardized and studentized residuals. Standardized residuals, <math>{{d}_{i}}\,\!</math>, are obtained using the following equation:  


R-sq(pred), also referred to as prediction , is obtained using PRESS as shown next:
(33)


   
::<math>\begin{align}
  {{d}_{i}}&= & \frac{{{e}_{i}}}{\sqrt{{{{\hat{\sigma }}}^{2}}}} \\
  & = & \frac{{{e}_{i}}}{\sqrt{M{{S}_{E}}}} 
\end{align}\,\!</math>


The values of R-sq, R-sq(adj) and S are indicators of how well the regression model fits the observed data. The values of PRESS and R-sq(pred) are indicators of how well the regression model predicts new observations. For example, higher values of PRESS or lower values of R-sq(pred) indicate a model that predicts poorly. Figure 5.19. shows these values for the data in Table 5.1. The values indicate that the regression model fits the data well and also predicts well.


Standardized residuals are scaled so that the standard deviation of the residuals is approximately equal to one. This helps to identify possible outliers or unusual observations. However, standardized residuals may understate the true residual magnitude, hence studentized residuals, <math>{{r}_{i}}\,\!</math>, are used in their place. Studentized residuals are calculated as follows:




Figure 5.19: Coefficient of multiple determination and related results for the data in Table 5.1.
::<math>


Residual Analysis
\begin{align}
Plots of residuals, , similar to the ones discussed in the previous chapter for simple linear regression, are used to check the adequacy of a fitted multiple linear regression model. The residuals are expected to be normally distributed with a mean of zero and a constant variance of . In addition, they should not show any patterns or trends when plotted against any variable or in a time or run-order sequence. Residual plots may also be obtained using standardized and studentized residuals. Standardized residuals, , are obtained using the following equation: [Note]
(34)


   
{{r}_{i}} & = & \frac{{{e}_{i}}}{\sqrt{{{{\hat{\sigma }}}^{2}}(1-{{h}_{ii}})}} \\
& = & \frac{{{e}_{i}}}{\sqrt{M{{S}_{E}}(1-{{h}_{ii}})}}  


Standardized residuals are scaled so that the standard deviation of the residuals is approximately equal to one. This helps to identify possible outliers or unusual observations. However, standardized residuals may understate the true residual magnitude, hence studentized residuals, , are used in their place. Studentized residuals are calculated as follows:
\end{align}
(35)


where  is the th diagonal element of the hat matrix, . External studentized (or the studentized deleted) residuals may also be used. These residuals are based on the PRESS residuals mentioned above in the Coefficient of Multiple Determination, R2 section. The reason for using the external studentized residuals is that if the th observation is an outlier, it may influence the fitted model. In this case, the residual  will be small and may not disclose that th observation is an outlier. The external studentized residual for the th observation, , is obtained as follows:
\,\!
(36)


</math>


Residual values for the data of Table 5.1 are shown in Figure 5.20. These values are available using the Diagnostics icon in the Control Panel. Standardized residual plots for the data are shown in Figures 5.21 to 5.23. DOE++ compares the residual values to the critical values on the  distribution for studentized and external studentized residuals. For other residuals the normal distribution is used. For example, for the data in Table 5.1, the critical values on the  distribution at a significance of 0.1 are  and  (as calculated in Example 5.3, Chapter 5, Test on Individual Regression Coefficients). The studentized residual values corresponding to the 3rd and 17th observations lie outside the critical values. Therefore, the 3rd and 17th observations are outliers. This can also be seen on the residual plots in Figures 22 and 23.


where <math>{{h}_{ii}}\,\!</math> is the <math>i\,\!</math> th diagonal element of the hat matrix, <math>H\,\!</math>. External studentized (or the studentized deleted) residuals may also be used. These residuals are based on the PRESS residuals mentioned in [[Multiple_Linear_Regression_Analysis#Coefficient_of_Multiple_Determination.2C_R2|Coefficient of Multiple Determination, ''R''<sup>2</sup>]]. The reason for using the external studentized residuals is that if the <math>i\,\!</math> th observation is an outlier, it may influence the fitted model. In this case, the residual <math>{{e}_{i}}\,\!</math> will be small and may not disclose that <math>i\,\!</math> th observation is an outlier. The external studentized residual for the <math>i\,\!</math> th observation, <math>{{t}_{i}}\,\!</math>, is obtained as follows:




Figure 5.20: Residual values for the data in Table 5.1.
::<math>{{t}_{i}}={{e}_{i}}{{\left[ \frac{n-k}{S{{S}_{E}}(1-{{h}_{ii}})-e_{i}^{2}} \right]}^{0.5}}\,\!</math>


 


Residual values for the data are shown in the figure below. Standardized residual plots for the data are shown in next two figures. The Weibull++ DOE folio compares the residual values to the critical values on the <math>t\,\!</math> distribution for studentized and external studentized residuals.


Figure 5.21: Residual probability plot for the data in Table 5.1.


 
[[Image:doe5_20.png|center|877px|Residual values for the data.|link=]]




[[Image:doe5_21.png|center|650px|Residual probability plot for the data.|link=]]


Figure 5.22: Residual versus fitted values plot for the data in Table 5.1.


 
For other residuals the normal distribution is used. For example, for the data, the critical values on the <math>t\,\!</math> distribution at a significance of 0.1 are <math>{{t}_{0.05,14}}=1.761\,\!</math> and <math>-{{t}_{0.05,14}}=-1.761\,\!</math> (as calculated in the [[Multiple_Linear_Regression_Analysis#Example_3|example]], [[Multiple_Linear_Regression_Analysis#Test_on_Individual_Regression_Coefficients_.28t__Test.29|Test on Individual Regression Coefficients (''t'' Test)]]). The studentized residual values corresponding to the 3rd and 17th observations lie outside the critical values. Therefore, the 3rd and 17th observations are outliers. This can also be seen on the residual plots in the next two figures.


[[Image:doe5_22.png|center|650px|Residual versus fitted values plot for the data.|link=]]


Figure 5.23: Residual versus run order plot for the data in Table 5.1.


[[Image:doe5_23.png|center|650px|Residual versus run order plot for the data.|link=]]


 
===Outlying ''x'' Observations===
Outlying Observations
Residuals help to identify outlying <math>y\,\!</math> observations. Outlying <math>x\,\!</math> observations can be detected using leverage. Leverage values are the diagonal elements of the hat matrix, <math>{{h}_{ii}}\,\!</math>. The <math>{{h}_{ii}}\,\!</math> values always lie between 0 and 1. Values of <math>{{h}_{ii}}\,\!</math> greater than <math>2(k+1)/n\,\!</math> are considered to be indicators of outlying <math>x\,\!</math> observations.  
Residuals help to identify outlying observations. Outlying observations can be detected using leverage. Leverage values are the diagonal elements of the hat matrix, . The values always lie between 0 and 1. Values of greater than are considered to be indicators of outlying observations. [Note]


Influential Observations Detection
===Influential Observations Detection===
Once an outlier is identified, it is important to determine if the outlier has a significant effect on the regression model. One measure to detect influential observations is Cook's distance measure which is computed as follows:
Once an outlier is identified, it is important to determine if the outlier has a significant effect on the regression model. One measure to detect influential observations is Cook's distance measure which is computed as follows:
(37)
To use Cook's distance measure, the  values are compared to percentile values on the  distribution with  degrees of freedom. If the percentile value is less than 10 or 20 percent, then the th case has little influence on the fitted values. However, if the percentile value is close to 50 percent or greater, the th case is influential, and fitted values with and without the th case will differ substantially. [10]




Example 5.6
::<math>{{D}_{i}}=\frac{r_{i}^{2}}{(k+1)}\left[ \frac{{{h}_{ii}}}{(1-{{h}_{ii}})} \right]\,\!</math>
 
 
To use Cook's distance measure, the <math>{{D}_{i}}\,\!</math> values are compared to percentile values on the <math>F\,\!</math> distribution with <math>(k+1,n-(k+1))\,\!</math> degrees of freedom. If the percentile value is less than 10 or 20 percent, then the <math>i\,\!</math> th case has little influence on the fitted values. However, if the percentile value is close to 50 percent or greater, the <math>i\,\!</math> th case is influential, and fitted values with and without the <math>i\,\!</math> th case will differ substantially.
 
 
====Example====
Cook's distance measure can be calculated as shown next. The distance measure is calculated for the first observation of the data. The remaining values along with the leverage values are shown in the figure below (displaying Leverage and Cook's distance measure for the data).




Cook's distance measure can be calculated as shown next. The distance measure is calculated for the first observation of the data in Table 5.1. The remaining values along with the leverage values are shown in Figure 5.24.
[[Image:doe5_24.png|center|874px|Leverage and Cook's distance measure for the data.|link=]]




The standardized residual corresponding to the first observation is:
The standardized residual corresponding to the first observation is:
::<math>\begin{align}
{{r}_{1}} & = & \frac{{{e}_{1}}}{\sqrt{M{{S}_{E}}(1-{{h}_{11}})}} \\
& = & \frac{1.3127}{\sqrt{30.3(1-0.2755)}} \\
& = & 0.2804 
\end{align}\,\!</math>


Cook's distance measure for the first observation can now be calculated as:
Cook's distance measure for the first observation can now be calculated as:


The 50th percentile value for  is 0.83. Since all  values are less than this value there are no influential observations.


   
::<math>\begin{align}
  {{D}_{1}} & = & \frac{r_{1}^{2}}{(k+1)}\left[ \frac{{{h}_{11}}}{(1-{{h}_{11}})} \right] \\
& = & \frac{{{0.2804}^{2}}}{(2+1)}\left[ \frac{0.2755}{(1-0.2755)} \right] \\
& = & 0.01 
\end{align}\,\!</math>
 
 
The 50th percentile value for <math>{{F}_{3,14}}\,\!</math> is 0.83. Since all <math>{{D}_{i}}\,\!</math> values are less than this value there are no influential observations.
 
===Lack-of-Fit Test===
The lack-of-fit test for simple linear regression discussed in [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]] may also be applied to multiple linear regression to check the appropriateness of the fitted response surface and see if a higher order model is required. Data for <math>m\,\!</math> replicates may be collected as follows for all <math>n\,\!</math> levels of the predictor variables:
 
 
::<math>\begin{align}
  &  & {{y}_{11}},{{y}_{12}},....,{{y}_{1m}}\text{    }m\text{ repeated observations at the first level } \\
&  & {{y}_{21}},{{y}_{22}},....,{{y}_{2m}}\text{    }m\text{ repeated observations at the second level} \\
&  & ... \\
&  & {{y}_{i1}},{{y}_{i2}},....,{{y}_{im}}\text{      }m\text{ repeated observations at the }i\text{th level} \\
&  & ... \\
&  & {{y}_{n1}},{{y}_{n2}},....,{{y}_{nm}}\text{    }m\text{ repeated observations at the }n\text{th level } 
\end{align}\,\!</math>
 
 
The sum of squares due to pure error, <math>S{{S}_{PE}}\,\!</math>, can be obtained as discussed in the [[Simple_Linear_Regression_Analysis| Simple Linear Regression Analysis]] as:
 
 
::<math>S{{S}_{PE}}=\underset{i=1}{\overset{n}{\mathop \sum }}\,\underset{j=1}{\overset{m}{\mathop \sum }}\,{{({{y}_{ij}}-{{\bar{y}}_{i}})}^{2}}\,\!</math>
 
 
The number of degrees of freedom associated with <math>S{{S}_{PE}}\,\!</math> are:
 


::<math> dof(S{S}_{PE}) = nm-n \,\! </math>


Figure 5.24: Leverage and Cook's distance measure for the data in Table 5.1.


Lack-of-Fit Test
Knowing <math>S{{S}_{PE}}\,\!</math>, sum of squares due to lack-of-fit, <math>S{{S}_{LOF}}\,\!</math>, can be obtained as:  
The lack-of-fit test for simple linear regression discussed in Chapter 4 may also be applied to multiple linear regression to check the appropriateness of the fitted response surface and see if a higher order model is required. Data for  replicates may be collected as follows for all  levels of the predictor variables:




The sum of squares due to pure error, , can be obtained as discussed in the previous chapter as:
::<math>S{{S}_{LOF}}=S{{S}_{E}}-S{{S}_{PE}}\,\!</math>




The number of degrees of freedom associated with <math>S{{S}_{LOF}}\,\!</math> are:


The number of degrees of freedom associated with  are:


::<math>


\begin{align}


Knowing , sum of squares due to lack-of-fit, , can be obtained as: [Note]
dof(S{{S}_{LOF}}) & = & dof(S{{S}_{E}})-dof(S{{S}_{PE}}) \\
& = & n-(k+1)-(nm-n) 


\end{align}


\,\!


The number of degrees of freedom associated with  are:
</math>




Line 1,027: Line 1,122:




 
::<math>\begin{align}
 
{{F}_{0}} & = & \frac{S{{S}_{LOF}}/dof(S{{S}_{LOF}})}{S{{S}_{PE}}/dof(S{{S}_{PE}})} \\
Other Topics in Multiple Linear Regression
& = & \frac{M{{S}_{LOF}}}{M{{S}_{PE}}} 
This section is divided into the following subsections:
\end{align}\,\!</math>




Polynomial Regression Models
==Other Topics in Multiple Linear Regression==
 
===Polynomial Regression Models===
Polynomial regression models are used when the response is curvilinear. The equation shown next presents a second order polynomial regression model with one predictor variable:


Qualitative Factors


Multicollinearity
::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{11}}x_{1}^{2}+\epsilon\,\!</math>


Polynomial Regression Models
Polynomial regression models are used when the response is curvilinear. The equation shown next presents a second order polynomial regression model with one predictor variable:


Usually, coded values are used in these models. Values of the variables are coded by centering or expressing the levels of the variable as deviations from the mean value of the variable and then scaling or dividing the deviations obtained by half of the range of the variable.
Usually, coded values are used in these models. Values of the variables are coded by centering or expressing the levels of the variable as deviations from the mean value of the variable and then scaling or dividing the deviations obtained by half of the range of the variable.
(38)




The reason for using coded predictor variables is that many times and are highly correlated and, if uncoded values are used, there may be computational difficulties while calculating the matrix to obtain the estimates, , of the regression coefficients using Eqn. (8).
::<math>coded\text{ }value=\frac{actual\text{ }value-mean}{half\text{ }of\text{ }range}\,\!</math>
 
 
The reason for using coded predictor variables is that many times <math>x\,\!</math> and <math>{{x}^{2}}\,\!</math> are highly correlated and, if uncoded values are used, there may be computational difficulties while calculating the <math>{{({{X}^{\prime }}X)}^{-1}}\,\!</math> matrix to obtain the estimates, <math>\hat{\beta }\,\!</math>, of the regression coefficients using the equation for the <math>F\,\!</math> distribution given in [[Statistical_Background_on_DOE#F_Distribution|Statistics Background on DOE]].
 
===Qualitative Factors===
The multiple linear regression model also supports the use of qualitative factors.  For example, gender may need to be included as a factor in a regression model. One of the ways to include qualitative factors in a regression model is to employ indicator variables. Indicator variables take on values of 0 or 1. For example, an indicator variable may be used with a value of 1 to indicate female and a value of 0 to indicate male.
 
 
::<math>{{x}_{1}}=\{\begin{array}{*{35}{l}}
  1\text{      Female}  \\
  0\text{      Male}  \\
\end{array}\,\!</math>
 
 
In general ( <math>n-1\,\!</math> ) indicator variables are required to represent a qualitative factor with <math>n\,\!</math> levels. As an example, a qualitative factor representing three types of machines may be represented as follows using two indicator variables:
 


Qualitative Factors
::<math>\begin{align}
The multiple linear regression model also supports the use of qualitative factors. [Note] For example, gender may need to be included as a factor in a regression model. One of the ways to include qualitative factors in a regression model is to employ indicator variables. Indicator variables take on values of 0 or 1. For example, an indicator variable may be used with a value of 1 to indicate female and a value of 0 to indicate male.
{{x}_{1}} & = & 1,\text{  }{{x}_{2}} & = & 0\text{    Machine Type I} \\
{{x}_{1}} & = & 0,\text{  }{{x}_{2}} & = & 1\text{    Machine Type II} \\
{{x}_{1}} & = & 0,\text{  }{{x}_{2}} & = & 0\text{    Machine Type III} 
\end{align}\,\!</math>


In general () indicator variables are required to represent a qualitative factor with  levels. As an example, a qualitative factor representing three types of machines may be represented as follows using two indicator variables:


An alternative coding scheme for this example is to use a value of -1 for all indicator variables when representing the last level of the factor:
An alternative coding scheme for this example is to use a value of -1 for all indicator variables when representing the last level of the factor:
::<math>\begin{align}
  {{x}_{1}} & = & 1,\text{  }{{x}_{2}}& = &0\text{          Machine Type I} \\
  {{x}_{1}}& = & 0,\text{  }{{x}_{2}}& = &1\text{          Machine Type II} \\
  {{x}_{1}}& = & -1,\text{  }{{x}_{2}}& = &-1\text{    Machine Type III} 
\end{align}\,\!</math>


Indicator variables are also referred to as dummy variables or binary variables.
Indicator variables are also referred to as dummy variables or binary variables.


====Example====
Consider data from two types of reactors of a chemical process shown where the yield values are recorded for various levels of factor <math>{{x}_{1}}\,\!</math>. Assuming there are no interactions between the reactor type and <math>{{x}_{1}}\,\!</math>, a regression model can be fitted to this data as shown next.
 
Since the reactor type is a qualitative factor with two levels, it can be represented by using one indicator variable. Let <math>{{x}_{2}}\,\!</math> be the indicator variable representing the reactor type, with 0 representing the first type of reactor and 1 representing the second type of reactor.
 


Example 5.7
::<math>{{x}_{2}}=\{\begin{array}{*{35}{l}}
  0\text{      Reactor Type I}  \\
  1\text{      Reactor Type II}  \\
\end{array}\,\!</math>




Consider data from two types of reactors of a chemical process shown in Table 5.3 where the yield values are recorded for various levels of factor . Assuming there are no interactions between the reactor type and , a regression model can be fitted to this data as shown next.
[[Image:doet5.3.png|center|323px|Yield data from the two types of reactors for a chemical process.|link=]]




Since the reactor type is a qualitative factor with two levels, it can be represented by using one indicator variable. Let  be the indicator variable representing the reactor type, with 0 representing the first type of reactor and 1 representing the second type of reactor.
Data entry in the DOE folio for this example is shown in the figure after the table below. The regression model for this data is:


Data entry in DOE++ for this example is shown in Figure 5.25. The regression model for this data is:


::<math>Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\!</math>


 
   
   
The <math>X\,\!</math> and <math>y\,\!</math> matrices for the given data are:
[[Image:doe5_25.png|center|700px|Data from the table above as entered in Weibull++.]]


Table 5.3: Yield data from two types of reactors for a chemical process.


 
The estimated regression coefficients for the model can be obtained as:


Figure 5.25: Data from Table 5.3 as entered in DOE++.


    
::<math>\begin{align}
The and matrices for the given data are:
   \hat{\beta }& = & {{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\
& = & \left[ \begin{matrix}
  153.7  \\
  2.4  \\
  -27.5 \\
\end{matrix} \right]  
\end{align}\,\!</math>


The estimated regression coefficients for the model can be obtained using Eqn. (8) as:


Therefore, the fitted regression model is:
Therefore, the fitted regression model is:




::<math>\hat{y}=153.7+2.4{{x}_{1}}-27.5{{x}_{2}}\,\!</math>


Note that since  represents a qualitative predictor variable, the fitted regression model cannot be plotted simultaneously against  and  in a two dimensional space (because the resulting surface plot will be meaningless for the dimension in ). To illustrate this, a scatter plot of the data in Table 5.3 against  is shown in Figure 5.26. It can be noted that, in the case of qualitative factors, the nature of the relationship between the response (yield) and the qualitative factor (reactor type) cannot be categorized as linear, or quadratic, or cubic, etc. The only conclusion that can be arrived at for these factors is to see if these factors contribute significantly to the regression model. This can be done by employing the partial  test of Chapter 5, Test on Subsets of Regression Coefficients (using the extra sum of squares of the indicator variables representing these factors). The results of the test for the present example are shown in the ANOVA table of Figure 5.27. The results show that  (reactor type) contributes significantly to the fitted regression model.


Note that since <math>{{x}_{2}}\,\!</math> represents a qualitative predictor variable, the fitted regression model cannot be plotted simultaneously against <math>{{x}_{1}}\,\!</math> and <math>{{x}_{2}}\,\!</math> in a two-dimensional space (because the resulting surface plot will be meaningless for the dimension in <math>{{x}_{2}}\,\!</math> ). To illustrate this, a scatter plot of the data against <math>{{x}_{2}}\,\!</math> is shown in the following figure.




[[Image:doe5_26.png|center|700px|Scatter plot of the observed yield values against <math>x_2\,\!</math> (reactor type)]]


Figure: 5.26: Scatter plot of the observed yield values in Table 5.3 against  (reactor type).


 
It can be noted that, in the case of qualitative factors, the nature of the relationship between the response (yield) and the qualitative factor (reactor type) cannot be categorized as linear, or quadratic, or cubic, etc. The only conclusion that can be arrived at for these factors is to see if these factors contribute significantly to the regression model. This can be done by employing the partial <math>F\,\!</math> test discussed in [[Multiple_Linear_Regression_Analysis#Test_on_Subsets_of_Regression_Coefficients_.28Partial_F_Test.29|Multiple Linear Regression Analysis]] (using the extra sum of squares of the indicator variables representing these factors). The results of the test for the present example are shown in the ANOVA table. The results show that <math>{{x}_{2}}\,\!</math> (reactor type) contributes significantly to the fitted regression model.




Figure 5.27: DOE++ results for the data in Table 5.3.
[[Image:doe5_27.png|center|700px|DOE results for the data.]]


 
===Multicollinearity===
Multicollinearity  
At times the predictor variables included in a multiple linear regression model may be found to be dependent on each other. Multicollinearity is said to exist in a multiple regression model with strong dependencies between the predictor variables.
At times the predictor variables included in a multiple linear regression model may be found to be dependent on each other. Multicollinearity is said to exist in a multiple regression model with strong dependencies between the predictor variables.
Multicollinearity affects the regression coefficients and the extra sum of squares of the predictor variables. In a model with multicollinearity the estimate of the regression coefficient of a predictor variable depends on what other predictor variables are included the model. The dependence may even lead to change in the sign of the regression coefficient. In a such models, an estimated regression coefficient may not be found to be significant individually (when using the <math>t\,\!</math> test on the individual coefficient or looking at the <math>p\,\!</math> value) even though a statistical relation is found to exist between the response variable and the set of the predictor variables (when using the <math>F\,\!</math> test for the set of predictor variables). Therefore, you should be careful while looking at individual predictor variables in models that have multicollinearity. Care should also be taken while looking at the extra sum of squares for a predictor variable that is correlated with other variables. This is because in models with multicollinearity the extra sum of squares is not unique and depends on the other predictor variables included in the model.
Multicollinearity can be detected using the variance inflation factor (abbreviated <math>VIF\,\!</math> ). <math>VIF\,\!</math> for a coefficient <math>{{\beta }_{j}}\,\!</math> is defined as:
::<math>VIF=\frac{1}{(1-R_{j}^{2})}\,\!</math>
where <math>R_{j}^{2}\,\!</math> is the coefficient of multiple determination resulting from regressing the <math>j\,\!</math> th predictor variable, <math>{{x}_{j}}\,\!</math>, on the remaining <math>k\,\!</math> -1 predictor variables. Mean values of <math>VIF\,\!</math> considerably greater than 1 indicate multicollinearity problems.
A few methods of dealing with multicollinearity include increasing the number of observations in a way designed to break up dependencies among predictor variables, combining the linearly dependent predictor variables into one variable, eliminating variables from the model that are unimportant or using coded variables.
====Example====
Variance inflation factors can be obtained for the data below.
[[Image:doet5.1.png|center|351px|Observed yield data for various levels of two factors.|link=]]
To calculate the variance inflation factor for <math>{{x}_{1}}\,\!</math>, <math>R_{1}^{2}\,\!</math> has to be calculated.<math>R_{1}^{2}\,\!</math> is the coefficient of determination for the model when <math>{{x}_{1}}\,\!</math> is regressed on the remaining variables. In the case of this example there is just one remaining variable which is <math>{{x}_{2}}\,\!</math>. If a regression model is fit to the data, taking <math>{{x}_{1}}\,\!</math> as the response variable and <math>{{x}_{2}}\,\!</math> as the predictor variable, then the design matrix and the vector of observations are:


   
::<math>{{X}_{{{R}_{1}}}}=\left[ \begin{matrix}
  1 & 29.1 \\
  1 & 29.3  \\
  . & .  \\
  . & .  \\
  . & .  \\
  1 & 32.9  \\
\end{matrix} \right]\text{    }{{y}_{{{R}_{1}}}}=\left[ \begin{matrix}
  41.9  \\
  43.4  \\
  .  \\
  .  \\
  .  \\
  77.8  \\
\end{matrix} \right]\,\!</math>
 
 
The regression sum of squares for this model can be obtained as:


Multicollinearity affects the regression coefficients and the extra sum of squares of the predictor variables. In a model with multicollinearity the estimate of the regression coefficient of a predictor variable depends on what other predictor variables are included the model. The dependence may even lead to change in the sign of the regression coefficient. In such models, an estimated regression coefficient may not be found to be significant individually (when using the  test on the individual coefficient or looking at the  value) even though a statistical relation is found to exist between the response variable and the set of the predictor variables (when using the  test for the set of predictor variables). Therefore, you should be careful while looking at individual predictor variables in models that have multicollinearity. Care should also be taken while looking at the extra sum of squares for a predictor variable that is correlated with other variables. This is because in models with multicollinearity the extra sum of squares is not unique and depends on the other predictor variables included in the model. [Note]


   
::<math>\begin{align}
S{{S}_{R}}= & y_{{{R}_{1}}}^{\prime }\left[ {{H}_{{{R}_{1}}}}-(\frac{1}{n})J \right]{{y}_{{{R}_{1}}}} \\
= & 1988.6  
\end{align}\,\!</math>


Multicollinearity can be detected using the variance inflation factor (abbreviated ).  for a coefficient  is defined as:
(39)
where  is the coefficient of multiple determination resulting from regressing the th predictor variable, , on the remaining -1 predictor variables. Mean values of  considerably greater than 1 indicate multicollinearity problems.


where <math>{{H}_{{{R}_{1}}}}\,\!</math> is the hat matrix (and is calculated using <math>{{H}_{{{R}_{1}}}}={{X}_{{{R}_{1}}}}{{(X_{{{R}_{1}}}^{\prime }{{X}_{{{R}_{1}}}})}^{-1}}X_{{{R}_{1}}}^{\prime }\,\!</math> ) and <math>J\,\!</math> is the matrix of ones. The total sum of squares for the model can be calculated as:


A few methods of dealing with multicollinearity include increasing the number of observations in a way designed to break up dependencies among predictor variables, combining the linearly dependent predictor variables into one variable, eliminating variables from the model that are unimportant or using coded variables. [Note]


   
::<math>\begin{align}
S{{S}_{T}}= & {{y}^{\prime }}\left[ I-(\frac{1}{n})J \right]y \\
= & 2182.9  
\end{align}\,\!</math>


Example 5.8


where <math>I\,\!</math> is the identity matrix. Therefore:


Variance inflation factors can be obtained for the data in Table 5.1. To calculate the variance inflation factor for ,  has to be calculated.  is the coefficient of determination for the model when  is regressed on the remaining variables. In the case of this example there is just one remaining variable which is . If a regression model is fit to the data, taking  as the response variable and  as the predictor variable, then the design matrix and the vector of observations are:


::<math>\begin{align}
R_{1}^{2}= & \frac{S{{S}_{R}}}{S{{S}_{T}}} \\
= & \frac{1988.6}{2182.9} \\
= & 0.911 
\end{align}\,\!</math>


The regression sum of squares for this model can be obtained using Eqn. (17) as:


where  is the hat matrix (and is calculated using ) and  is the matrix of ones. The total sum of squares for the model can be calculated using Eqn. (31) as:
Then the variance inflation factor for <math>{{x}_{1}}\,\!</math> is:


where  is the identity matrix. Therefore:


Then the variance inflation factor for is:
::<math>\begin{align}
VI{{F}_{1}}= & \frac{1}{(1-R_{1}^{2})} \\
= & \frac{1}{1-0.911} \\
= & 11.2  
\end{align}\,\!</math>


The variance inflation factor for , , can be obtained in a similar manner. In DOE++, the variance inflation factors are displayed in the VIF column of the Regression Information Table as shown in Figure 5.28. Since the values of the variance inflation factors obtained are considerably greater than 1, multicollinearity is an issue for the data in Table 5.1.


The variance inflation factor for <math>{{x}_{2}}\,\!</math>, <math>VI{{F}_{2}}\,\!</math>, can be obtained in a similar manner. In the DOE folios, the variance inflation factors are displayed in the VIF column of the Regression Information table as shown in the following figure. Since the values of the variance inflation factors obtained are considerably greater than 1, multicollinearity is an issue for the data.




Figure 5.28: Variance inflation factors for the data in Table 5.1.
[[Image:doe5_28.png|center|888px|Variance inflation factors for the data in.|link=]]

Latest revision as of 22:22, 9 August 2018

New format available! This reference is now available in a new format that offers faster page load, improved display for calculations and images, more targeted search and the latest content available as a PDF. As of September 2023, this Reliawiki page will not continue to be updated. Please update all links and bookmarks to the latest reference at help.reliasoft.com/reference/experiment_design_and_analysis

Chapter 4: Multiple Linear Regression Analysis


DOEbox.png

Chapter 4  
Multiple Linear Regression Analysis  

Synthesis-icon.png

Available Software:
Weibull++

Examples icon.png

More Resources:
DOE examples


This chapter expands on the analysis of simple linear regression models and discusses the analysis of multiple linear regression models. A major portion of the results displayed in Weibull++ DOE folios are explained in this chapter because these results are associated with multiple linear regression. One of the applications of multiple linear regression models is Response Surface Methodology (RSM). RSM is a method used to locate the optimum value of the response and is one of the final stages of experimentation. It is discussed in Response Surface Methods. Towards the end of this chapter, the concept of using indicator variables in regression models is explained. Indicator variables are used to represent qualitative factors in regression models. The concept of using indicator variables is important to gain an understanding of ANOVA models, which are the models used to analyze data obtained from experiments. These models can be thought of as first order multiple linear regression models where all the factors are treated as qualitative factors. ANOVA models are discussed in the One Factor Designs and General Full Factorial Designs chapters.

Multiple Linear Regression Model

A linear regression model that contains more than one predictor variable is called a multiple linear regression model. The following model is a multiple linear regression model with two predictor variables, [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math].


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math]


The model is linear because it is linear in the parameters [math]\displaystyle{ {{\beta }_{0}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] and [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math]. The model describes a plane in the three-dimensional space of [math]\displaystyle{ Y\,\! }[/math], [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math]. The parameter [math]\displaystyle{ {{\beta }_{0}}\,\! }[/math] is the intercept of this plane. Parameters [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] and [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] are referred to as partial regression coefficients. Parameter [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] represents the change in the mean response corresponding to a unit change in [math]\displaystyle{ {{x}_{1}}\,\! }[/math] when [math]\displaystyle{ {{x}_{2}}\,\! }[/math] is held constant. Parameter [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] represents the change in the mean response corresponding to a unit change in [math]\displaystyle{ {{x}_{2}}\,\! }[/math] when [math]\displaystyle{ {{x}_{1}}\,\! }[/math] is held constant. Consider the following example of a multiple linear regression model with two predictor variables, [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math] :


[math]\displaystyle{ Y=30+5{{x}_{1}}+7{{x}_{2}}+\epsilon \,\! }[/math]


This regression model is a first order multiple linear regression model. This is because the maximum power of the variables in the model is 1. (The regression plane corresponding to this model is shown in the figure below.) Also shown is an observed data point and the corresponding random error, [math]\displaystyle{ \epsilon\,\! }[/math]. The true regression model is usually never known (and therefore the values of the random error terms corresponding to observed data points remain unknown). However, the regression model can be estimated by calculating the parameters of the model for an observed data set. This is explained in Estimating Regression Models Using Least Squares.

One of the following figures shows the contour plot for the regression model the above equation. The contour plot shows lines of constant mean response values as a function of [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math]. The contour lines for the given regression model are straight lines as seen on the plot. Straight contour lines result for first order regression models with no interaction terms.

A linear regression model may also take the following form:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\! }[/math]


A cross-product term, [math]\displaystyle{ {{x}_{1}}{{x}_{2}}\,\! }[/math], is included in the model. This term represents an interaction effect between the two variables [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math]. Interaction means that the effect produced by a change in the predictor variable on the response depends on the level of the other predictor variable(s). As an example of a linear regression model with interaction, consider the model given by the equation [math]\displaystyle{ Y=30+5{{x}_{1}}+7{{x}_{2}}+3{{x}_{1}}{{x}_{2}}+\epsilon\,\! }[/math]. The regression plane and contour plot for this model are shown in the following two figures, respectively.


Regression plane for the model [math]\displaystyle{ Y=30+5 x_1+7 x_2+\epsilon\,\! }[/math]


Countour plot for the model [math]\displaystyle{ Y=30+5 x_1+7 x_2+\epsilon\,\! }[/math]


Now consider the regression model shown next:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}x_{1}^{2}+{{\beta }_{3}}x_{1}^{3}+\epsilon\,\! }[/math]


This model is also a linear regression model and is referred to as a polynomial regression model. Polynomial regression models contain squared and higher order terms of the predictor variables making the response surface curvilinear. As an example of a polynomial regression model with an interaction term consider the following equation:


[math]\displaystyle{ Y=500+5{{x}_{1}}+7{{x}_{2}}-3x_{1}^{2}-5x_{2}^{2}+3{{x}_{1}}{{x}_{2}}+\epsilon\,\! }[/math]


This model is a second order model because the maximum power of the terms in the model is two. The regression surface for this model is shown in the following figure. Such regression models are used in RSM to find the optimum value of the response, [math]\displaystyle{ Y\,\! }[/math] (for details see Response Surface Methods for Optimization). Notice that, although the shape of the regression surface is curvilinear, the regression model is still linear because the model is linear in the parameters. The contour plot for this model is shown in the second of the following two figures.


Regression plane for the model [math]\displaystyle{ Y=30+5 x_1+7 x_2+3 x_1 x_2+\epsilon\,\! }[/math]


Countour plot for the model [math]\displaystyle{ Y=30+5 x_1+7 x_2+3 x_1 x_2+\epsilon\,\! }[/math]


All multiple linear regression models can be expressed in the following general form:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+...+{{\beta }_{k}}{{x}_{k}}+\epsilon\,\! }[/math]


where [math]\displaystyle{ k\,\! }[/math] denotes the number of terms in the model. For example, the model can be written in the general form using [math]\displaystyle{ {{x}_{3}}=x_{1}^{2}\,\! }[/math], [math]\displaystyle{ {{x}_{4}}=x_{2}^{3}\,\! }[/math] and [math]\displaystyle{ {{x}_{5}}={{x}_{1}}{{x}_{2}}\,\! }[/math] as follows:


[math]\displaystyle{ Y=500+5{{x}_{1}}+7{{x}_{2}}-3{{x}_{3}}-5{{x}_{4}}+3{{x}_{5}}+\epsilon\,\! }[/math]

Estimating Regression Models Using Least Squares

Consider a multiple linear regression model with [math]\displaystyle{ k\,\! }[/math] predictor variables:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+...+{{\beta }_{k}}{{x}_{k}}+\epsilon\,\! }[/math]


Let each of the [math]\displaystyle{ k\,\! }[/math] predictor variables, [math]\displaystyle{ {{x}_{1}}\,\! }[/math], [math]\displaystyle{ {{x}_{2}}\,\! }[/math]... [math]\displaystyle{ {{x}_{k}}\,\! }[/math], have [math]\displaystyle{ n\,\! }[/math] levels. Then [math]\displaystyle{ {{x}_{ij}}\,\! }[/math] represents the [math]\displaystyle{ i\,\! }[/math] th level of the [math]\displaystyle{ j\,\! }[/math] th predictor variable [math]\displaystyle{ {{x}_{j}}\,\! }[/math]. For example, [math]\displaystyle{ {{x}_{51}}\,\! }[/math] represents the fifth level of the first predictor variable [math]\displaystyle{ {{x}_{1}}\,\! }[/math], while [math]\displaystyle{ {{x}_{19}}\,\! }[/math] represents the first level of the ninth predictor variable, [math]\displaystyle{ {{x}_{9}}\,\! }[/math]. Observations, [math]\displaystyle{ {{y}_{1}}\,\! }[/math], [math]\displaystyle{ {{y}_{2}}\,\! }[/math]... [math]\displaystyle{ {{y}_{n}}\,\! }[/math], recorded for each of these [math]\displaystyle{ n\,\! }[/math] levels can be expressed in the following way:


[math]\displaystyle{ \begin{align} {{y}_{1}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{11}}+{{\beta }_{2}}{{x}_{12}}+...+{{\beta }_{k}}{{x}_{1k}}+{{\epsilon }_{1}} \\ {{y}_{2}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{21}}+{{\beta }_{2}}{{x}_{22}}+...+{{\beta }_{k}}{{x}_{2k}}+{{\epsilon }_{2}} \\ & .. \\ {{y}_{i}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{i1}}+{{\beta }_{2}}{{x}_{i2}}+...+{{\beta }_{k}}{{x}_{ik}}+{{\epsilon }_{i}} \\ & .. \\ {{y}_{n}}= & {{\beta }_{0}}+{{\beta }_{1}}{{x}_{n1}}+{{\beta }_{2}}{{x}_{n2}}+...+{{\beta }_{k}}{{x}_{nk}}+{{\epsilon }_{n}} \end{align}\,\! }[/math]


The system of [math]\displaystyle{ n\,\! }[/math] equations shown previously can be represented in matrix notation as follows:


[math]\displaystyle{ y=X\beta +\epsilon\,\! }[/math]


where


[math]\displaystyle{ y=\left[ \begin{matrix} {{y}_{1}} \\ {{y}_{2}} \\ . \\ . \\ . \\ {{y}_{n}} \\ \end{matrix} \right]\text{ }X=\left[ \begin{matrix} 1 & {{x}_{11}} & {{x}_{12}} & . & . & . & {{x}_{1n}} \\ 1 & {{x}_{21}} & {{x}_{22}} & . & . & . & {{x}_{2n}} \\ . & . & . & {} & {} & {} & . \\ . & . & . & {} & {} & {} & . \\ . & . & . & {} & {} & {} & . \\ 1 & {{x}_{n1}} & {{x}_{n2}} & . & . & . & {{x}_{nn}} \\ \end{matrix} \right]\,\! }[/math]


[math]\displaystyle{ \beta =\left[ \begin{matrix} {{\beta }_{0}} \\ {{\beta }_{1}} \\ . \\ . \\ . \\ {{\beta }_{n}} \\ \end{matrix} \right]\text{ and }\epsilon =\left[ \begin{matrix} {{\epsilon }_{1}} \\ {{\epsilon }_{2}} \\ . \\ . \\ . \\ {{\epsilon }_{n}} \\ \end{matrix} \right]\,\! }[/math]


The matrix [math]\displaystyle{ X\,\! }[/math] is referred to as the design matrix. It contains information about the levels of the predictor variables at which the observations are obtained. The vector [math]\displaystyle{ \beta\,\! }[/math] contains all the regression coefficients. To obtain the regression model, [math]\displaystyle{ \beta\,\! }[/math] should be known. [math]\displaystyle{ \beta\,\! }[/math] is estimated using least square estimates. The following equation is used:


[math]\displaystyle{ \hat{\beta }={{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y\,\! }[/math]


where [math]\displaystyle{ ^{\prime }\,\! }[/math] represents the transpose of the matrix while [math]\displaystyle{ ^{-1}\,\! }[/math] represents the matrix inverse. Knowing the estimates, [math]\displaystyle{ \hat{\beta }\,\! }[/math], the multiple linear regression model can now be estimated as:


[math]\displaystyle{ \hat{y}=X\hat{\beta }\,\! }[/math]


The estimated regression model is also referred to as the fitted model. The observations, [math]\displaystyle{ {{y}_{i}}\,\! }[/math], may be different from the fitted values [math]\displaystyle{ {{\hat{y}}_{i}}\,\! }[/math] obtained from this model. The difference between these two values is the residual, [math]\displaystyle{ {{e}_{i}}\,\! }[/math]. The vector of residuals, [math]\displaystyle{ e\,\! }[/math], is obtained as:


[math]\displaystyle{ e=y-\hat{y}\,\! }[/math]


The fitted model can also be written as follows, using [math]\displaystyle{ \hat{\beta }={{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y\,\! }[/math]:


[math]\displaystyle{ \begin{align} \hat{y} &= & X\hat{\beta } \\ & = & X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\ & = & Hy \end{align}\,\! }[/math]


where [math]\displaystyle{ H=X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}\,\! }[/math]. The matrix, [math]\displaystyle{ H\,\! }[/math], is referred to as the hat matrix. It transforms the vector of the observed response values, [math]\displaystyle{ y\,\! }[/math], to the vector of fitted values, [math]\displaystyle{ \hat{y}\,\! }[/math].

Example

An analyst studying a chemical process expects the yield to be affected by the levels of two factors, [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math]. Observations recorded for various levels of the two factors are shown in the following table. The analyst wants to fit a first order regression model to the data. Interaction between [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math] is not expected based on knowledge of similar processes. Units of the factor levels and the yield are ignored for the analysis.


Observed yield data for various levels of two factors.


The data of the above table can be entered into the Weibull++ DOE folio using the multiple linear regression folio tool as shown in the following figure.


Multiple Regression tool in Webibull++ with the data in the table.


A scatter plot for the data is shown next.


Three-dimensional scatter plot for the observed data in the table.


The first order regression model applicable to this data set having two predictor variables is:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math]


where the dependent variable, [math]\displaystyle{ Y\,\! }[/math], represents the yield and the predictor variables, [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math], represent the two factors respectively. The [math]\displaystyle{ X\,\! }[/math] and [math]\displaystyle{ y\,\! }[/math] matrices for the data can be obtained as:


[math]\displaystyle{ X=\left[ \begin{matrix} 1 & 41.9 & 29.1 \\ 1 & 43.4 & 29.3 \\ . & . & . \\ . & . & . \\ . & . & . \\ 1 & 77.8 & 32.9 \\ \end{matrix} \right]\text{ }y=\left[ \begin{matrix} 251.3 \\ 251.3 \\ . \\ . \\ . \\ 349.0 \\ \end{matrix} \right]\,\! }[/math]


The least square estimates, [math]\displaystyle{ \hat{\beta }\,\! }[/math], can now be obtained:


[math]\displaystyle{ \begin{align} \hat{\beta } &= & {{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\ & = & {{\left[ \begin{matrix} 17 & 941 & 525.3 \\ 941 & 54270 & 29286 \\ 525.3 & 29286 & 16254 \\ \end{matrix} \right]}^{-1}}\left[ \begin{matrix} 4902.8 \\ 276610 \\ 152020 \\ \end{matrix} \right] \\ & = & \left[ \begin{matrix} -153.51 \\ 1.24 \\ 12.08 \\ \end{matrix} \right] \end{align}\,\! }[/math]


Thus:


[math]\displaystyle{ \hat{\beta }=\left[ \begin{matrix} {{{\hat{\beta }}}_{0}} \\ {{{\hat{\beta }}}_{1}} \\ {{{\hat{\beta }}}_{2}} \\ \end{matrix} \right]=\left[ \begin{matrix} -153.51 \\ 1.24 \\ 12.08 \\ \end{matrix} \right]\,\! }[/math]


and the estimated regression coefficients are [math]\displaystyle{ {{\hat{\beta }}_{0}}=-153.51\,\! }[/math], [math]\displaystyle{ {{\hat{\beta }}_{1}}=1.24\,\! }[/math] and [math]\displaystyle{ {{\hat{\beta }}_{2}}=12.08\,\! }[/math]. The fitted regression model is:


[math]\displaystyle{ \begin{align} \hat{y} & = & {{{\hat{\beta }}}_{0}}+{{{\hat{\beta }}}_{1}}{{x}_{1}}+{{{\hat{\beta }}}_{2}}{{x}_{2}} \\ & = & -153.5+1.24{{x}_{1}}+12.08{{x}_{2}} \end{align}\,\! }[/math]


The fitted regression model can be viewed in the Weibull++ DOE folio, as shown next.


Equation of the fitted regression model for the data from the table.


A plot of the fitted regression plane is shown in the following figure.


Fitted regression plane [math]\displaystyle{ \hat{y}=-153.5+1.24 x_1+12.08 x_2\,\! }[/math] for the data from the table.


The fitted regression model can be used to obtain fitted values, [math]\displaystyle{ {{\hat{y}}_{i}}\,\! }[/math], corresponding to an observed response value, [math]\displaystyle{ {{y}_{i}}\,\! }[/math]. For example, the fitted value corresponding to the fifth observation is:


[math]\displaystyle{ \begin{align} {{{\hat{y}}}_{i}} &= & -153.5+1.24{{x}_{i1}}+12.08{{x}_{i2}} \\ {{{\hat{y}}}_{5}} & = & -153.5+1.24{{x}_{51}}+12.08{{x}_{52}} \\ & = & -153.5+1.24(47.3)+12.08(29.9) \\ & = & 266.3 \end{align}\,\! }[/math]


The observed fifth response value is [math]\displaystyle{ {{y}_{5}}=273.0\,\! }[/math]. The residual corresponding to this value is:


[math]\displaystyle{ \begin{align} {{e}_{i}} & = & {{y}_{i}}-{{{\hat{y}}}_{i}} \\ {{e}_{5}}& = & {{y}_{5}}-{{{\hat{y}}}_{5}} \\ & = & 273.0-266.3 \\ & = & 6.7 \end{align}\,\! }[/math]


In Weibull++ DOE folios, fitted values and residuals are shown in the Diagnostic Information table of the detailed summary of results. The values are shown in the following figure.


Fitted values and residuals for the data in the table.


The fitted regression model can also be used to predict response values. For example, to obtain the response value for a new observation corresponding to 47 units of [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and 31 units of [math]\displaystyle{ {{x}_{2}}\,\! }[/math], the value is calculated using:


[math]\displaystyle{ \begin{align} \hat{y}(47,31)& = & -153.5+1.24(47)+12.08(31) \\ & = & 279.26 \end{align}\,\! }[/math]

Properties of the Least Square Estimators for Beta

The least square estimates, [math]\displaystyle{ {{\hat{\beta }}_{0}}\,\! }[/math], [math]\displaystyle{ {{\hat{\beta }}_{1}}\,\! }[/math], [math]\displaystyle{ {{\hat{\beta }}_{2}}\,\! }[/math]... [math]\displaystyle{ {{\hat{\beta }}_{k}}\,\! }[/math], are unbiased estimators of [math]\displaystyle{ {{\beta }_{0}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math]... [math]\displaystyle{ {{\beta }_{k}}\,\! }[/math], provided that the random error terms, [math]\displaystyle{ {{\epsilon }_{i}}\,\! }[/math], are normally and independently distributed. The variances of the [math]\displaystyle{ \hat{\beta }\,\! }[/math] s are obtained using the [math]\displaystyle{ {{({{X}^{\prime }}X)}^{-1}}\,\! }[/math] matrix. The variance-covariance matrix of the estimated regression coefficients is obtained as follows:


[math]\displaystyle{ C={{\hat{\sigma }}^{2}}{{({{X}^{\prime }}X)}^{-1}}\,\! }[/math]


[math]\displaystyle{ C\,\! }[/math] is a symmetric matrix whose diagonal elements, [math]\displaystyle{ {{C}_{jj}}\,\! }[/math], represent the variance of the estimated [math]\displaystyle{ j\,\! }[/math] th regression coefficient, [math]\displaystyle{ {{\hat{\beta }}_{j}}\,\! }[/math]. The off-diagonal elements, [math]\displaystyle{ {{C}_{ij}}\,\! }[/math], represent the covariance between the [math]\displaystyle{ i\,\! }[/math] th and [math]\displaystyle{ j\,\! }[/math] th estimated regression coefficients, [math]\displaystyle{ {{\hat{\beta }}_{i}}\,\! }[/math] and [math]\displaystyle{ {{\hat{\beta }}_{j}}\,\! }[/math]. The value of [math]\displaystyle{ {{\hat{\sigma }}^{2}}\,\! }[/math] is obtained using the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math]. The variance-covariance matrix for the data in the table (see Estimating Regression Models Using Least Squares) can be viewed in the DOE folio, as shown next.


The variance-covariance matrix for the data in table.


Calculations to obtain the matrix are given in this example. The positive square root of [math]\displaystyle{ {{C}_{jj}}\,\! }[/math] represents the estimated standard deviation of the [math]\displaystyle{ j\,\! }[/math] th regression coefficient, [math]\displaystyle{ {{\hat{\beta }}_{j}}\,\! }[/math], and is called the estimated standard error of [math]\displaystyle{ {{\hat{\beta }}_{j}}\,\! }[/math] (abbreviated [math]\displaystyle{ se({{\hat{\beta }}_{j}})\,\! }[/math] ).


[math]\displaystyle{ se({{\hat{\beta }}_{j}})=\sqrt{{{C}_{jj}}}\,\! }[/math]

Hypothesis Tests in Multiple Linear Regression

This section discusses hypothesis tests on the regression coefficients in multiple linear regression. As in the case of simple linear regression, these tests can only be carried out if it can be assumed that the random error terms, [math]\displaystyle{ {{\epsilon }_{i}}\,\! }[/math], are normally and independently distributed with a mean of zero and variance of [math]\displaystyle{ {{\sigma }^{2}}\,\! }[/math]. Three types of hypothesis tests can be carried out for multiple linear regression models:

  1. Test for significance of regression: This test checks the significance of the whole regression model.
  2. [math]\displaystyle{ t\,\! }[/math] test: This test checks the significance of individual regression coefficients.
  3. [math]\displaystyle{ F\,\! }[/math] test: This test can be used to simultaneously check the significance of a number of regression coefficients. It can also be used to test individual coefficients.

Test for Significance of Regression

The test for significance of regression in the case of multiple linear regression analysis is carried out using the analysis of variance. The test is used to check if a linear statistical relationship exists between the response variable and at least one of the predictor variables. The statements for the hypotheses are:


[math]\displaystyle{ \begin{align} & {{H}_{0}}:& {{\beta }_{1}}={{\beta }_{2}}=...={{\beta }_{k}}=0 \\ & {{H}_{1}}:& {{\beta }_{j}}\ne 0\text{ for at least one }j \end{align}\,\! }[/math]


The test for [math]\displaystyle{ {{H}_{0}}\,\! }[/math] is carried out using the following statistic:


[math]\displaystyle{ {{F}_{0}}=\frac{M{{S}_{R}}}{M{{S}_{E}}}\,\! }[/math]


where [math]\displaystyle{ M{{S}_{R}}\,\! }[/math] is the regression mean square and [math]\displaystyle{ M{{S}_{E}}\,\! }[/math] is the error mean square. If the null hypothesis, [math]\displaystyle{ {{H}_{0}}\,\! }[/math], is true then the statistic [math]\displaystyle{ {{F}_{0}}\,\! }[/math] follows the [math]\displaystyle{ F\,\! }[/math] distribution with [math]\displaystyle{ k\,\! }[/math] degrees of freedom in the numerator and [math]\displaystyle{ n-\,\! }[/math] ( [math]\displaystyle{ k+1\,\! }[/math] ) degrees of freedom in the denominator. The null hypothesis, [math]\displaystyle{ {{H}_{0}}\,\! }[/math], is rejected if the calculated statistic, [math]\displaystyle{ {{F}_{0}}\,\! }[/math], is such that:


[math]\displaystyle{ {{F}_{0}}\gt {{f}_{\alpha ,k,n-(k+1)}}\,\! }[/math]


Calculation of the Statistic [math]\displaystyle{ {{F}_{0}}\,\! }[/math]

To calculate the statistic [math]\displaystyle{ {{F}_{0}}\,\! }[/math], the mean squares [math]\displaystyle{ M{{S}_{R}}\,\! }[/math] and [math]\displaystyle{ M{{S}_{E}}\,\! }[/math] must be known. As explained in Simple Linear Regression Analysis, the mean squares are obtained by dividing the sum of squares by their degrees of freedom. For example, the total mean square, [math]\displaystyle{ M{{S}_{T}}\,\! }[/math], is obtained as follows:


[math]\displaystyle{ M{{S}_{T}}=\frac{S{{S}_{T}}}{dof(S{{S}_{T}})}\,\! }[/math]


where [math]\displaystyle{ S{{S}_{T}}\,\! }[/math] is the total sum of squares and [math]\displaystyle{ dof(S{{S}_{T}})\,\! }[/math] is the number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{T}}\,\! }[/math]. In multiple linear regression, the following equation is used to calculate [math]\displaystyle{ S{{S}_{T}}\,\! }[/math] :


[math]\displaystyle{ S{{S}_{T}}={{y}^{\prime }}\left[ I-(\frac{1}{n})J \right]y\,\! }[/math]


where [math]\displaystyle{ n\,\! }[/math] is the total number of observations, [math]\displaystyle{ y\,\! }[/math] is the vector of observations (that was defined in Estimating Regression Models Using Least Squares), [math]\displaystyle{ I\,\! }[/math] is the identity matrix of order [math]\displaystyle{ n\,\! }[/math] and [math]\displaystyle{ J\,\! }[/math] represents an [math]\displaystyle{ n\times n\,\! }[/math] square matrix of ones. The number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{T}}\,\! }[/math], [math]\displaystyle{ dof(S{{S}_{T}})\,\! }[/math], is ( [math]\displaystyle{ n-1\,\! }[/math] ). Knowing [math]\displaystyle{ S{{S}_{T}}\,\! }[/math] and [math]\displaystyle{ dof(S{{S}_{T}})\,\! }[/math] the total mean square, [math]\displaystyle{ M{{S}_{T}}\,\! }[/math], can be calculated.

The regression mean square, [math]\displaystyle{ M{{S}_{R}}\,\! }[/math], is obtained by dividing the regression sum of squares, [math]\displaystyle{ S{{S}_{R}}\,\! }[/math], by the respective degrees of freedom, [math]\displaystyle{ dof(S{{S}_{R}})\,\! }[/math], as follows:


[math]\displaystyle{ M{{S}_{R}}=\frac{S{{S}_{R}}}{dof(S{{S}_{R}})}\,\! }[/math]


The regression sum of squares, [math]\displaystyle{ S{{S}_{R}}\,\! }[/math], is calculated using the following equation:


[math]\displaystyle{ S{{S}_{R}}={{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y\,\! }[/math]


where [math]\displaystyle{ n\,\! }[/math] is the total number of observations, [math]\displaystyle{ y\,\! }[/math] is the vector of observations, [math]\displaystyle{ H\,\! }[/math] is the hat matrix and [math]\displaystyle{ J\,\! }[/math] represents an [math]\displaystyle{ n\times n\,\! }[/math] square matrix of ones. The number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{R}}\,\! }[/math], [math]\displaystyle{ dof(S{{S}_{E}})\,\! }[/math], is [math]\displaystyle{ k\,\! }[/math], where [math]\displaystyle{ k\,\! }[/math] is the number of predictor variables in the model. Knowing [math]\displaystyle{ S{{S}_{R}}\,\! }[/math] and [math]\displaystyle{ dof(S{{S}_{R}})\,\! }[/math] the regression mean square, [math]\displaystyle{ M{{S}_{R}}\,\! }[/math], can be calculated. The error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], is obtained by dividing the error sum of squares, [math]\displaystyle{ S{{S}_{E}}\,\! }[/math], by the respective degrees of freedom, [math]\displaystyle{ dof(S{{S}_{E}})\,\! }[/math], as follows:


[math]\displaystyle{ M{{S}_{E}}=\frac{S{{S}_{E}}}{dof(S{{S}_{E}})}\,\! }[/math]


The error sum of squares, [math]\displaystyle{ S{{S}_{E}}\,\! }[/math], is calculated using the following equation:


[math]\displaystyle{ S{{S}_{E}}={{y}^{\prime }}(I-H)y\,\! }[/math]


where [math]\displaystyle{ y\,\! }[/math] is the vector of observations, [math]\displaystyle{ I\,\! }[/math] is the identity matrix of order [math]\displaystyle{ n\,\! }[/math] and [math]\displaystyle{ H\,\! }[/math] is the hat matrix. The number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{E}}\,\! }[/math], [math]\displaystyle{ dof(S{{S}_{E}})\,\! }[/math], is [math]\displaystyle{ n-(k+1)\,\! }[/math], where [math]\displaystyle{ n\,\! }[/math] is the total number of observations and [math]\displaystyle{ k\,\! }[/math] is the number of predictor variables in the model. Knowing [math]\displaystyle{ S{{S}_{E}}\,\! }[/math] and [math]\displaystyle{ dof(S{{S}_{E}})\,\! }[/math], the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], can be calculated. The error mean square is an estimate of the variance, [math]\displaystyle{ {{\sigma }^{2}}\,\! }[/math], of the random error terms, [math]\displaystyle{ {{\epsilon }_{i}}\,\! }[/math].


[math]\displaystyle{ {{\hat{\sigma }}^{2}}=M{{S}_{E}}\,\! }[/math]
Example

The test for the significance of regression, for the regression model obtained for the data in the table (see Estimating Regression Models Using Least Squares), is illustrated in this example. The null hypothesis for the model is:


[math]\displaystyle{ {{H}_{0}}: {{\beta }_{1}}={{\beta }_{2}}=0\,\! }[/math]


The statistic to test [math]\displaystyle{ {{H}_{0}}\,\! }[/math] is:


[math]\displaystyle{ {{F}_{0}}=\frac{M{{S}_{R}}}{M{{S}_{E}}}\,\! }[/math]


To calculate [math]\displaystyle{ {{F}_{0}}\,\! }[/math], first the sum of squares are calculated so that the mean squares can be obtained. Then the mean squares are used to calculate the statistic [math]\displaystyle{ {{F}_{0}}\,\! }[/math] to carry out the significance test. The regression sum of squares, [math]\displaystyle{ S{{S}_{R}}\,\! }[/math], can be obtained as:


[math]\displaystyle{ S{{S}_{R}}={{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y\,\! }[/math]


The hat matrix, [math]\displaystyle{ H\,\! }[/math] is calculated as follows using the design matrix [math]\displaystyle{ X\,\! }[/math] from the previous example:


[math]\displaystyle{ \begin{align} H & = & X{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }} \\ & = & \left[ \begin{matrix} 0.27552 & 0.25154 & . & . & -0.04030 \\ 0.25154 & 0.23021 & . & . & -0.029120 \\ . & . & . & . & . \\ . & . & . & . & . \\ -0.04030 & -0.02920 & . & . & 0.30115 \\ \end{matrix} \right] \end{align}\,\! }[/math]


Knowing [math]\displaystyle{ y\,\! }[/math], [math]\displaystyle{ H\,\! }[/math] and [math]\displaystyle{ J\,\! }[/math], the regression sum of squares, [math]\displaystyle{ S{{S}_{R}}\,\! }[/math], can be calculated:


[math]\displaystyle{ \begin{align} S{{S}_{R}} & = & {{y}^{\prime }}\left[ H-(\frac{1}{n})J \right]y \\ & = & 12816.35 \end{align}\,\! }[/math]


The degrees of freedom associated with [math]\displaystyle{ S{{S}_{R}}\,\! }[/math] is [math]\displaystyle{ k\,\! }[/math], which equals to a value of two since there are two predictor variables in the data in the table (see Multiple Linear Regression Analysis). Therefore, the regression mean square is:


[math]\displaystyle{ \begin{align} M{{S}_{R}}& = & \frac{S{{S}_{R}}}{dof(S{{S}_{R}})} \\ & = & \frac{12816.35}{2} \\ & = & 6408.17 \end{align}\,\! }[/math]


Similarly to calculate the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], the error sum of squares, [math]\displaystyle{ S{{S}_{E}}\,\! }[/math], can be obtained as:


[math]\displaystyle{ \begin{align} S{{S}_{E}} &= & {{y}^{\prime }}\left[ I-H \right]y \\ & = & 423.37 \end{align}\,\! }[/math]


The degrees of freedom associated with [math]\displaystyle{ S{{S}_{E}}\,\! }[/math] is [math]\displaystyle{ n-(k+1)\,\! }[/math]. Therefore, the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], is:


[math]\displaystyle{ \begin{align} M{{S}_{E}} &= & \frac{S{{S}_{E}}}{dof(S{{S}_{E}})} \\ & = & \frac{S{{S}_{E}}}{(n-(k+1))} \\ & = & \frac{423.37}{(17-(2+1))} \\ & = & 30.24 \end{align}\,\! }[/math]


The statistic to test the significance of regression can now be calculated as:


[math]\displaystyle{ \begin{align} {{f}_{0}}& = & \frac{M{{S}_{R}}}{M{{S}_{E}}} \\ & = & \frac{6408.17}{423.37/(17-3)} \\ & = & 211.9 \end{align}\,\! }[/math]


The critical value for this test, corresponding to a significance level of 0.1, is:


[math]\displaystyle{ \begin{align} {{f}_{\alpha ,k,n-(k+1)}} &= & {{f}_{0.1,2,14}} \\ & = & 2.726 \end{align}\,\! }[/math]


Since [math]\displaystyle{ {{f}_{0}}\gt {{f}_{0.1,2,14}}\,\! }[/math], [math]\displaystyle{ {{H}_{0}}:\,\! }[/math] [math]\displaystyle{ {{\beta }_{1}}={{\beta }_{2}}=0\,\! }[/math] is rejected and it is concluded that at least one coefficient out of [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] and [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is significant. In other words, it is concluded that a regression model exists between yield and either one or both of the factors in the table. The analysis of variance is summarized in the following table.


ANOVA table for the significance of regression test.

Test on Individual Regression Coefficients (t Test)

The [math]\displaystyle{ t\,\! }[/math] test is used to check the significance of individual regression coefficients in the multiple linear regression model. Adding a significant variable to a regression model makes the model more effective, while adding an unimportant variable may make the model worse. The hypothesis statements to test the significance of a particular regression coefficient, [math]\displaystyle{ {{\beta }_{j}}\,\! }[/math], are:


[math]\displaystyle{ \begin{align} & {{H}_{0}}: & {{\beta }_{j}}=0 \\ & {{H}_{1}}: & {{\beta }_{j}}\ne 0 \end{align}\,\! }[/math]


The test statistic for this test is based on the [math]\displaystyle{ t\,\! }[/math] distribution (and is similar to the one used in the case of simple linear regression models in Simple Linear Regression Anaysis):


[math]\displaystyle{ {{T}_{0}}=\frac{{{{\hat{\beta }}}_{j}}}{se({{{\hat{\beta }}}_{j}})}\,\! }[/math]


where the standard error, [math]\displaystyle{ se({{\hat{\beta }}_{j}})\,\! }[/math], is obtained. The analyst would fail to reject the null hypothesis if the test statistic lies in the acceptance region:


[math]\displaystyle{ -{{t}_{\alpha /2,n-2}}\lt {{T}_{0}}\lt {{t}_{\alpha /2,n-2}}\,\! }[/math]


This test measures the contribution of a variable while the remaining variables are included in the model. For the model [math]\displaystyle{ \hat{y}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{x}_{1}}+{{\hat{\beta }}_{2}}{{x}_{2}}+{{\hat{\beta }}_{3}}{{x}_{3}}\,\! }[/math], if the test is carried out for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], then the test will check the significance of including the variable [math]\displaystyle{ {{x}_{1}}\,\! }[/math] in the model that contains [math]\displaystyle{ {{x}_{2}}\,\! }[/math] and [math]\displaystyle{ {{x}_{3}}\,\! }[/math] (i.e., the model [math]\displaystyle{ \hat{y}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{2}}{{x}_{2}}+{{\hat{\beta }}_{3}}{{x}_{3}}\,\! }[/math] ). Hence the test is also referred to as partial or marginal test. In DOE folios, this test is displayed in the Regression Information table.

Example

The test to check the significance of the estimated regression coefficients for the data is illustrated in this example. The null hypothesis to test the coefficient [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is:


[math]\displaystyle{ {{H}_{0}}:{{\beta }_{2}}=0\,\! }[/math]


The null hypothesis to test [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] can be obtained in a similar manner. To calculate the test statistic, [math]\displaystyle{ {{T}_{0}}\,\! }[/math], we need to calculate the standard error. In the example, the value of the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], was obtained as 30.24. The error mean square is an estimate of the variance, [math]\displaystyle{ {{\sigma }^{2}}\,\! }[/math].


Therefore:


[math]\displaystyle{ \begin{align} {{{\hat{\sigma }}}^{2}} &= & M{{S}_{E}} \\ & = & 30.24 \end{align}\,\! }[/math]


The variance-covariance matrix of the estimated regression coefficients is:


[math]\displaystyle{ \begin{align} C &= & {{{\hat{\sigma }}}^{2}}{{({{X}^{\prime }}X)}^{-1}} \\ & = & 30.24\left[ \begin{matrix} 336.5 & 1.2 & -13.1 \\ 1.2 & 0.005 & -0.049 \\ -13.1 & -0.049 & 0.5 \\ \end{matrix} \right] \\ & = & \left[ \begin{matrix} 10176.75 & 37.145 & -395.83 \\ 37.145 & 0.1557 & -1.481 \\ -395.83 & -1.481 & 15.463 \\ \end{matrix} \right] \end{align}\,\! }[/math]


From the diagonal elements of [math]\displaystyle{ C\,\! }[/math], the estimated standard error for [math]\displaystyle{ {{\hat{\beta }}_{1}}\,\! }[/math] and [math]\displaystyle{ {{\hat{\beta }}_{2}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} se({{{\hat{\beta }}}_{1}}) &= & \sqrt{0.1557}=0.3946 \\ se({{{\hat{\beta }}}_{2}})& = & \sqrt{15.463}=3.93 \end{align}\,\! }[/math]


The corresponding test statistics for these coefficients are:


[math]\displaystyle{ \begin{align} {{({{t}_{0}})}_{{{{\hat{\beta }}}_{1}}}} &= & \frac{{{{\hat{\beta }}}_{1}}}{se({{{\hat{\beta }}}_{1}})}=\frac{1.24}{0.3946}=3.1393 \\ {{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}} &= & \frac{{{{\hat{\beta }}}_{2}}}{se({{{\hat{\beta }}}_{2}})}=\frac{12.08}{3.93}=3.0726 \end{align}\,\! }[/math]


The critical values for the present [math]\displaystyle{ t\,\! }[/math] test at a significance of 0.1 are:


[math]\displaystyle{ \begin{align} {{t}_{\alpha /2,n-(k+1)}} &= & {{t}_{0.05,14}}=1.761 \\ -{{t}_{\alpha /2,n-(k+1)}} & = & -{{t}_{0.05,14}}=-1.761 \end{align}\,\! }[/math]


Considering [math]\displaystyle{ {{\hat{\beta }}_{2}}\,\! }[/math], it can be seen that [math]\displaystyle{ {{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}}\,\! }[/math] does not lie in the acceptance region of [math]\displaystyle{ -{{t}_{0.05,14}}\lt {{t}_{0}}\lt {{t}_{0.05,14}}\,\! }[/math]. The null hypothesis, [math]\displaystyle{ {{H}_{0}}:{{\beta }_{2}}=0\,\! }[/math], is rejected and it is concluded that [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is significant at [math]\displaystyle{ \alpha =0.1\,\! }[/math]. This conclusion can also be arrived at using the [math]\displaystyle{ p\,\! }[/math] value noting that the hypothesis is two-sided. The [math]\displaystyle{ p\,\! }[/math] value corresponding to the test statistic, [math]\displaystyle{ {{({{t}_{0}})}_{{{{\hat{\beta }}}_{2}}}} = 3.0726\,\! }[/math], based on the [math]\displaystyle{ t\,\! }[/math] distribution with 14 degrees of freedom is:


[math]\displaystyle{ \begin{align} p\text{ }value & = & 2\times (1-P(T\le |{{t}_{0}}|) \\ & = & 2\times (1-0.9959) \\ & = & 0.0083 \end{align}\,\! }[/math]


Since the [math]\displaystyle{ p\,\! }[/math] value is less than the significance, [math]\displaystyle{ \alpha =0.1\,\! }[/math], it is concluded that [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is significant. The hypothesis test on [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] can be carried out in a similar manner.

As explained in Simple Linear Regression Analysis, in DOE folios, the information related to the [math]\displaystyle{ t\,\! }[/math] test is displayed in the Regression Information table as shown in the figure below.


Regression results for the data.


In this table, the [math]\displaystyle{ t\,\! }[/math] test for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is displayed in the row for the term Factor 2 because [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is the coefficient that represents this factor in the regression model. Columns labeled Standard Error, T Value and P Value represent the standard error, the test statistic for the [math]\displaystyle{ t\,\! }[/math] test and the [math]\displaystyle{ p\,\! }[/math] value for the [math]\displaystyle{ t\,\! }[/math] test, respectively. These values have been calculated for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] in this example. The Coefficient column represents the estimate of regression coefficients. These values are calculated as shown in this example. The Effect column represents values obtained by multiplying the coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in Two-Level Factorial Experiments. Columns labeled Low Confidence and High Confidence represent the limits of the confidence intervals for the regression coefficients and are explained in Confidence Intervals in Multiple Linear Regression. The Variance Inflation Factor column displays values that give a measure of multicollinearity. This is explained in Multicollinearity.

Test on Subsets of Regression Coefficients (Partial F Test)

This test can be considered to be the general form of the [math]\displaystyle{ t\,\! }[/math] test mentioned in the previous section. This is because the test simultaneously checks the significance of including many (or even one) regression coefficients in the multiple linear regression model. Adding a variable to a model increases the regression sum of squares, [math]\displaystyle{ S{{S}_{R}}\,\! }[/math]. The test is based on this increase in the regression sum of squares. The increase in the regression sum of squares is called the extra sum of squares. Assume that the vector of the regression coefficients, [math]\displaystyle{ \beta\,\! }[/math], for the multiple linear regression model, [math]\displaystyle{ y=X\beta +\epsilon\,\! }[/math], is partitioned into two vectors with the second vector, [math]\displaystyle{ {{\theta}_{2}}\,\! }[/math], containing the last [math]\displaystyle{ r\,\! }[/math] regression coefficients, and the first vector, [math]\displaystyle{ {{\theta}_{1}}\,\! }[/math], containing the first ( [math]\displaystyle{ k+1-r\,\! }[/math] ) coefficients as follows:


[math]\displaystyle{ \beta =\left[ \begin{matrix} {{\theta}_{1}} \\ {{\theta}_{2}} \\ \end{matrix} \right]\,\! }[/math]


with:


[math]\displaystyle{ {{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}}...{{\beta }_{k-r}}{]}'\text{ and }{{\theta}_{2}}=[{{\beta }_{k-r+1}},{{\beta }_{k-r+2}}...{{\beta }_{k}}{]}'\text{ }\,\! }[/math]


The hypothesis statements to test the significance of adding the regression coefficients in [math]\displaystyle{ {{\theta}_{2}}\,\! }[/math] to a model containing the regression coefficients in [math]\displaystyle{ {{\theta}_{1}}\,\! }[/math] may be written as:


[math]\displaystyle{ \begin{align} & {{H}_{0}}: & {{\theta}_{2}}=0 \\ & {{H}_{1}}: & {{\theta}_{2}}\ne 0 \end{align}\,\! }[/math]


The test statistic for this test follows the [math]\displaystyle{ F\,\! }[/math] distribution and can be calculated as follows:


[math]\displaystyle{ {{F}_{0}}=\frac{S{{S}_{R}}({{\theta}_{2}}|{{\theta}_{1}})/r}{M{{S}_{E}}}\,\! }[/math]


where [math]\displaystyle{ S{{S}_{R}}({{\theta}_{2}}|{{\theta}_{1}})\,\! }[/math] is the the increase in the regression sum of squares when the variables corresponding to the coefficients in [math]\displaystyle{ {{\theta}_{2}}\,\! }[/math] are added to a model already containing [math]\displaystyle{ {{\theta}_{1}}\,\! }[/math], and [math]\displaystyle{ M{{S}_{E}}\,\! }[/math] is obtained from the equation given in Simple Linear Regression Analysis. The value of the extra sum of squares is obtained as explained in the next section.

The null hypothesis, [math]\displaystyle{ {{H}_{0}}\,\! }[/math], is rejected if [math]\displaystyle{ {{F}_{0}}\gt {{f}_{\alpha ,r,n-(k+1)}}\,\! }[/math]. Rejection of [math]\displaystyle{ {{H}_{0}}\,\! }[/math] leads to the conclusion that at least one of the variables in [math]\displaystyle{ {{x}_{k-r+1}}\,\! }[/math], [math]\displaystyle{ {{x}_{k-r+2}}\,\! }[/math]... [math]\displaystyle{ {{x}_{k}}\,\! }[/math] contributes significantly to the regression model. In a DOE folio, the results from the partial [math]\displaystyle{ F\,\! }[/math] test are displayed in the ANOVA table.

ANOVA Table for Extra Sum of Squares in Weibull++.

Types of Extra Sum of Squares

The extra sum of squares can be calculated using either the partial (or adjusted) sum of squares or the sequential sum of squares. The type of extra sum of squares used affects the calculation of the test statistic for the partial [math]\displaystyle{ F\,\! }[/math] test described above. In DOE folios, selection for the type of extra sum of squares is available as shown in the figure below. The partial sum of squares is used as the default setting. The reason for this is explained in the following section on the partial sum of squares.


Partial Sum of Squares

The partial sum of squares for a term is the extra sum of squares when all terms, except the term under consideration, are included in the model. For example, consider the model:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\! }[/math]


The sum of squares of regression of this model is denoted by [math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}})\,\! }[/math]. Assume that we need to know the partial sum of squares for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math]. The partial sum of squares for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is the increase in the regression sum of squares when [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] is added to the model. This increase is the difference in the regression sum of squares for the full model of the equation given above and the model that includes all terms except [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math]. These terms are [math]\displaystyle{ {{\beta }_{0}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] and [math]\displaystyle{ {{\beta }_{12}}\,\! }[/math]. The model that contains these terms is:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+\epsilon\,\! }[/math]


The sum of squares of regression of this model is denoted by [math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})\,\! }[/math]. The partial sum of squares for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math]can be represented as [math]\displaystyle{ S{{S}_{R}}({{\beta }_{2}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})\,\! }[/math] and is calculated as follows:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{2}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}})& = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}})-S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}}) \end{align}\,\! }[/math]


For the present case, [math]\displaystyle{ {{\theta}_{2}}=[{{\beta }_{2}}{]}'\,\! }[/math] and [math]\displaystyle{ {{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{12}}{]}'\,\! }[/math]. It can be noted that for the partial sum of squares [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] contains all coefficients other than the coefficient being tested.

A Weibull++ DOE folio has the partial sum of squares as the default selection. This is because the [math]\displaystyle{ t\,\! }[/math] test is a partial test, i.e., the [math]\displaystyle{ t\,\! }[/math] test on an individual coefficient is carried by assuming that all the remaining coefficients are included in the model (similar to the way the partial sum of squares is calculated). The results from the [math]\displaystyle{ t\,\! }[/math] test are displayed in the Regression Information table. The results from the partial [math]\displaystyle{ F\,\! }[/math] test are displayed in the ANOVA table. To keep the results in the two tables consistent with each other, the partial sum of squares is used as the default selection for the results displayed in the ANOVA table. The partial sum of squares for all terms of a model may not add up to the regression sum of squares for the full model when the regression coefficients are correlated. If it is preferred that the extra sum of squares for all terms in the model always add up to the regression sum of squares for the full model then the sequential sum of squares should be used.

Example

This example illustrates the [math]\displaystyle{ F\,\! }[/math] test using the partial sum of squares. The test is conducted for the coefficient [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] corresponding to the predictor variable [math]\displaystyle{ {{x}_{1}}\,\! }[/math] for the data. The regression model used for this data set in the example is:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math]


The null hypothesis to test the significance of [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ {{H}_{0}}: {{\beta }_{1}}=0\,\! }[/math]


The statistic to test this hypothesis is:


[math]\displaystyle{ {{F}_{0}}=\frac{S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})/r}{M{{S}_{E}}}\,\! }[/math]


where [math]\displaystyle{ S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})\,\! }[/math] represents the partial sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ r\,\! }[/math] represents the number of degrees of freedom for [math]\displaystyle{ S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})\,\! }[/math] (which is one because there is just one coefficient, [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], being tested) and [math]\displaystyle{ M{{S}_{E}}\,\! }[/math] is the error mean square and has been calculated in the second example as 30.24.

The partial sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is the difference between the regression sum of squares for the full model, [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math], and the regression sum of squares for the model excluding [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math]. The regression sum of squares for the full model has been calculated in the second example as 12816.35. Therefore:


[math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}})=12816.35\,\! }[/math]


The regression sum of squares for the model [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math] is obtained as shown next. First the design matrix for this model, [math]\displaystyle{ {{X}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\! }[/math], is obtained by dropping the second column in the design matrix of the full model, [math]\displaystyle{ X\,\! }[/math] (the full design matrix, [math]\displaystyle{ X\,\! }[/math], was obtained in the example). The second column of [math]\displaystyle{ X\,\! }[/math] corresponds to the coefficient [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] which is no longer in the model. Therefore, the design matrix for the model, [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math], is:


[math]\displaystyle{ {{X}_{{{\beta }_{0}},{{\beta }_{2}}}}=\left[ \begin{matrix} 1 & 29.1 \\ 1 & 29.3 \\ . & . \\ . & . \\ 1 & 32.9 \\ \end{matrix} \right]\,\! }[/math]


The hat matrix corresponding to this design matrix is [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\! }[/math]. It can be calculated using [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{2}}}}={{X}_{{{\beta }_{0}},{{\beta }_{2}}}}{{(X_{{{\beta }_{0}},{{\beta }_{2}}}^{\prime }{{X}_{{{\beta }_{0}},{{\beta }_{2}}}})}^{-1}}X_{{{\beta }_{0}},{{\beta }_{2}}}^{\prime }\,\! }[/math]. Once [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{2}}}}\,\! }[/math] is known, the regression sum of squares for the model [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math], can be calculated as:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{0}},{{\beta }_{2}}) & = & {{y}^{\prime }}\left[ {{H}_{{{\beta }_{0}},{{\beta }_{2}}}}-(\frac{1}{n})J \right]y \\ & = & 12518.32 \end{align}\,\! }[/math]


Therefore, the partial sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})& = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}})-S{{S}_{R}}({{\beta }_{0}},{{\beta }_{2}}) \\ & = & 12816.35-12518.32 \\ & = & 298.03 \end{align}\,\! }[/math]


Knowing the partial sum of squares, the statistic to test the significance of [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} {{f}_{0}} &= & \frac{S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{2}})/r}{M{{S}_{E}}} \\ & = & \frac{298.03/1}{30.24} \\ & = & 9.855 \end{align}\,\! }[/math]


The [math]\displaystyle{ p\,\! }[/math] value corresponding to this statistic based on the [math]\displaystyle{ F\,\! }[/math] distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is:

[math]\displaystyle{ \begin{align} p\text{ }value &= & 1-P(F\le {{f}_{0}}) \\ & = & 1-0.9928 \\ & = & 0.0072 \end{align}\,\! }[/math]


Assuming that the desired significance is 0.1, since [math]\displaystyle{ p\,\! }[/math] value < 0.1, [math]\displaystyle{ {{H}_{0}}:{{\beta }_{1}}=0\,\! }[/math] is rejected and it can be concluded that [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is significant. The test for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] can be carried out in a similar manner. In the results obtained from the DOE folio, the calculations for this test are displayed in the ANOVA table as shown in the following figure. Note that the conclusion obtained in this example can also be obtained using the [math]\displaystyle{ t\,\! }[/math] test as explained in the example in Test on Individual Regression Coefficients (t Test). The ANOVA and Regression Information tables in the DOE folio represent two different ways to test for the significance of the variables included in the multiple linear regression model.

Sequential Sum of Squares

The sequential sum of squares for a coefficient is the extra sum of squares when coefficients are added to the model in a sequence. For example, consider the model:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+{{\beta }_{23}}{{x}_{2}}{{x}_{3}}+{{\beta }_{123}}{{x}_{1}}{{x}_{2}}{{x}_{3}}+\epsilon\,\! }[/math]


The sequential sum of squares for [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] is the increase in the sum of squares when [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] is added to the model observing the sequence of the equation given above. Therefore this extra sum of squares can be obtained by taking the difference between the regression sum of squares for the model after [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] was added and the regression sum of squares for the model before [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] was added to the model. The model after [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] is added is as follows:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{13}}{{x}_{1}}{{x}_{3}}+\epsilon\,\! }[/math]


This is because to maintain the sequence all coefficients preceding [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] must be included in the model. These are the coefficients [math]\displaystyle{ {{\beta }_{0}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math], [math]\displaystyle{ {{\beta }_{12}}\,\! }[/math] and [math]\displaystyle{ {{\beta }_{3}}\,\! }[/math]. Similarly the model before [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] is added must contain all coefficients of the equation given above except [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math]. This model can be obtained as follows:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{12}}{{x}_{1}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+\epsilon\,\! }[/math]


The sequential sum of squares for [math]\displaystyle{ {{\beta }_{13}}\,\! }[/math] can be calculated as follows:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{13}}|{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}) & = & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}},{{\beta }_{13}})- S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}) \end{align}\,\! }[/math]


For the present case, [math]\displaystyle{ {{\theta}_{2}}=[{{\beta }_{13}}{]}'\,\! }[/math] and [math]\displaystyle{ {{\theta}_{1}}=[{{\beta }_{0}},{{\beta }_{1}},{{\beta }_{2}},{{\beta }_{12}},{{\beta }_{3}}{]}'\,\! }[/math]. It can be noted that for the sequential sum of squares [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] contains all coefficients proceeding the coefficient being tested.

The sequential sum of squares for all terms will add up to the regression sum of squares for the full model, but the sequential sum of squares are order dependent.

Example

This example illustrates the partial [math]\displaystyle{ F\,\! }[/math] test using the sequential sum of squares. The test is conducted for the coefficient [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] corresponding to the predictor variable [math]\displaystyle{ {{x}_{1}}\,\! }[/math] for the data. The regression model used for this data set in the example is:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon \,\! }[/math]


The null hypothesis to test the significance of [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ {{H}_{0}}:{{\beta }_{1}}=0\,\! }[/math]


The statistic to test this hypothesis is:


[math]\displaystyle{ {{F}_{0}}=\frac{S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})/r}{M{{S}_{E}}}\,\! }[/math]


where [math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})\,\! }[/math] represents the sequential sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ r\,\! }[/math] represents the number of degrees of freedom for [math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})\,\! }[/math] (which is one because there is just one coefficient, [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], being tested) and [math]\displaystyle{ M{{S}_{E}}\,\! }[/math] is the error mean square and has been calculated in the second example as 30.24.

The sequential sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is the difference between the regression sum of squares for the model after adding [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\! }[/math], and the regression sum of squares for the model before adding [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math], [math]\displaystyle{ Y={{\beta }_{0}}+\epsilon\,\! }[/math]. The regression sum of squares for the model [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\! }[/math] is obtained as shown next. First the design matrix for this model, [math]\displaystyle{ {{X}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\! }[/math], is obtained by dropping the third column in the design matrix for the full model, [math]\displaystyle{ X\,\! }[/math] (the full design matrix, [math]\displaystyle{ X\,\! }[/math], was obtained in the example). The third column of [math]\displaystyle{ X\,\! }[/math] corresponds to coefficient [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] which is no longer used in the present model. Therefore, the design matrix for the model, [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\! }[/math], is:


[math]\displaystyle{ {{X}_{{{\beta }_{0}},{{\beta }_{1}}}}=\left[ \begin{matrix} 1 & 41.9 \\ 1 & 43.4 \\ . & . \\ . & . \\ 1 & 77.8 \\ \end{matrix} \right]\,\! }[/math]


The hat matrix corresponding to this design matrix is [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\! }[/math]. It can be calculated using [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{1}}}}={{X}_{{{\beta }_{0}},{{\beta }_{1}}}}{{(X_{{{\beta }_{0}},{{\beta }_{1}}}^{\prime }{{X}_{{{\beta }_{0}},{{\beta }_{1}}}})}^{-1}}X_{{{\beta }_{0}},{{\beta }_{1}}}^{\prime }\,\! }[/math]. Once [math]\displaystyle{ {{H}_{{{\beta }_{0}},{{\beta }_{1}}}}\,\! }[/math] is known, the regression sum of squares for the model [math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+\epsilon\,\! }[/math] can be calculated as:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})& = & {{y}^{\prime }}\left[ {{H}_{{{\beta }_{0}},{{\beta }_{1}}}}-(\frac{1}{n})J \right]y \\ & = & 12530.85 \end{align}\,\! }[/math]


Sequential sum of squares for the data.


The regression sum of squares for the model [math]\displaystyle{ Y={{\beta }_{0}}+\epsilon\,\! }[/math] is equal to zero since this model does not contain any variables. Therefore:


[math]\displaystyle{ S{{S}_{R}}({{\beta }_{0}})=0\,\! }[/math]


The sequential sum of squares for [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} S{{S}_{R}}({{\beta }_{1}}|{{\beta }_{0}}) &= & S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})-S{{S}_{R}}({{\beta }_{0}}) \\ & = & 12530.85-0 \\ & = & 12530.85 \end{align}\,\! }[/math]


Knowing the sequential sum of squares, the statistic to test the significance of [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} {{f}_{0}} &= & \frac{S{{S}_{R}}({{\beta }_{0}},{{\beta }_{1}})/r}{M{{S}_{E}}} \\ & = & \frac{12530.85/1}{30.24} \\ & = & 414.366 \end{align}\,\! }[/math]


The [math]\displaystyle{ p\,\! }[/math] value corresponding to this statistic based on the [math]\displaystyle{ F\,\! }[/math] distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is:


[math]\displaystyle{ \begin{align} p\text{ }value &= & 1-P(F\le {{f}_{0}}) \\ & = & 1-0.999999 \\ & = & 8.46\times {{10}^{-12}} \end{align}\,\! }[/math]


Assuming that the desired significance is 0.1, since [math]\displaystyle{ p\,\! }[/math] value < 0.1, [math]\displaystyle{ {{H}_{0}}:{{\beta }_{1}}=0\,\! }[/math] is rejected and it can be concluded that [math]\displaystyle{ {{\beta }_{1}}\,\! }[/math] is significant. The test for [math]\displaystyle{ {{\beta }_{2}}\,\! }[/math] can be carried out in a similar manner. This result is shown in the following figure.

Confidence Intervals in Multiple Linear Regression

Calculation of confidence intervals for multiple linear regression models are similar to those for simple linear regression models explained in Simple Linear Regression Analysis.

Confidence Interval on Regression Coefficients

A 100 ([math]\displaystyle{ 1-\alpha\,\! }[/math]) percent confidence interval on the regression coefficient, [math]\displaystyle{ {{\beta }_{j}}\,\! }[/math], is obtained as follows:


[math]\displaystyle{ {{\hat{\beta }}_{j}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{C}_{jj}}}\,\! }[/math]


The confidence interval on the regression coefficients are displayed in the Regression Information table under the Low Confidence and High Confidence columns as shown in the following figure.


Confidence interval for the fitted value corresponding to the fifth observation.


Confidence Interval on Fitted Values, [math]\displaystyle{ {{\hat{y}}_{i}}\,\! }[/math] A 100 ([math]\displaystyle{ 1-\alpha\,\! }[/math]) percent confidence interval on any fitted value, [math]\displaystyle{ {{\hat{y}}_{i}}\,\! }[/math], is given by:


[math]\displaystyle{ {{\hat{y}}_{i}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{{\hat{\sigma }}}^{2}}x_{i}^{\prime }{{({{X}^{\prime }}X)}^{-1}}{{x}_{i}}}\,\! }[/math]


where:


[math]\displaystyle{ {{x}_{i}}=\left[ \begin{matrix} 1 \\ {{x}_{i1}} \\ . \\ . \\ . \\ {{x}_{ik}} \\ \end{matrix} \right]\,\! }[/math]


In the above example, the fitted value corresponding to the fifth observation was calculated as [math]\displaystyle{ {{\hat{y}}_{5}}=266.3\,\! }[/math]. The 90% confidence interval on this value can be obtained as shown in the figure below. The values of 47.3 and 29.9 used in the figure are the values of the predictor variables corresponding to the fifth observation the table.

Confidence Interval on New Observations

As explained in Simple Linear Regression Analysis, the confidence interval on a new observation is also referred to as the prediction interval. The prediction interval takes into account both the error from the fitted model and the error associated with future observations. A 100 ([math]\displaystyle{ 1-\alpha\,\! }[/math]) percent confidence interval on a new observation, [math]\displaystyle{ {{\hat{y}}_{p}}\,\! }[/math], is obtained as follows:


[math]\displaystyle{ {{\hat{y}}_{p}}\pm {{t}_{\alpha /2,n-(k+1)}}\sqrt{{{{\hat{\sigma }}}^{2}}(1+x_{p}^{\prime }{{({{X}^{\prime }}X)}^{-1}}{{x}_{p}})}\,\! }[/math]


where:


[math]\displaystyle{ {{x}_{p}}=\left[ \begin{matrix} 1 \\ {{x}_{p1}} \\ . \\ . \\ . \\ {{x}_{pk}} \\ \end{matrix} \right]\,\! }[/math]


[math]\displaystyle{ {{x}_{p1}}\,\! }[/math],..., [math]\displaystyle{ {{x}_{pk}}\,\! }[/math] are the levels of the predictor variables at which the new observation, [math]\displaystyle{ {{\hat{y}}_{p}}\,\! }[/math], needs to be obtained.


In multiple linear regression, prediction intervals should only be obtained at the levels of the predictor variables where the regression model applies. In the case of multiple linear regression it is easy to miss this. Having values lying within the range of the predictor variables does not necessarily mean that the new observation lies in the region to which the model is applicable. For example, consider the next figure where the shaded area shows the region to which a two variable regression model is applicable. The point corresponding to [math]\displaystyle{ p\,\! }[/math] th level of first predictor variable, [math]\displaystyle{ {{x}_{1}}\,\! }[/math], and [math]\displaystyle{ p\,\! }[/math] th level of the second predictor variable, [math]\displaystyle{ {{x}_{2}}\,\! }[/math], does not lie in the shaded area, although both of these levels are within the range of the first and second predictor variables respectively. In this case, the regression model is not applicable at this point.


Predicted values and region of model application in multiple linear regression.

Measures of Model Adequacy

As in the case of simple linear regression, analysis of a fitted multiple linear regression model is important before inferences based on the model are undertaken. This section presents some techniques that can be used to check the appropriateness of the multiple linear regression model.

Coefficient of Multiple Determination, R2

The coefficient of multiple determination is similar to the coefficient of determination used in the case of simple linear regression. It is defined as:


[math]\displaystyle{ \begin{align} {{R}^{2}} & = & \frac{S{{S}_{R}}}{S{{S}_{T}}} \\ & = & 1-\frac{S{{S}_{E}}}{S{{S}_{T}}} \end{align}\,\! }[/math]


[math]\displaystyle{ {{R}^{2}}\,\! }[/math] indicates the amount of total variability explained by the regression model. The positive square root of [math]\displaystyle{ {{R}^{2}}\,\! }[/math] is called the multiple correlation coefficient and measures the linear association between [math]\displaystyle{ Y\,\! }[/math] and the predictor variables, [math]\displaystyle{ {{x}_{1}}\,\! }[/math], [math]\displaystyle{ {{x}_{2}}\,\! }[/math]... [math]\displaystyle{ {{x}_{k}}\,\! }[/math].

The value of [math]\displaystyle{ {{R}^{2}}\,\! }[/math] increases as more terms are added to the model, even if the new term does not contribute significantly to the model. An increase in the value of [math]\displaystyle{ {{R}^{2}}\,\! }[/math] cannot be taken as a sign to conclude that the new model is superior to the older model. A better statistic to use is the adjusted [math]\displaystyle{ {{R}^{2}}\,\! }[/math] statistic defined as follows:


[math]\displaystyle{ \begin{align} R_{adj}^{2} &= & 1-\frac{M{{S}_{E}}}{M{{S}_{T}}} \\ & = & 1-\frac{S{{S}_{E}}/(n-(k+1))}{S{{S}_{T}}/(n-1)} \\ & = & 1-(\frac{n-1}{n-(k+1)})(1-{{R}^{2}}) \end{align}\,\! }[/math]


The adjusted [math]\displaystyle{ {{R}^{2}}\,\! }[/math] only increases when significant terms are added to the model. Addition of unimportant terms may lead to a decrease in the value of [math]\displaystyle{ R_{adj}^{2}\,\! }[/math].

In a DOE folio, [math]\displaystyle{ {{R}^{2}}\,\! }[/math] and [math]\displaystyle{ R_{adj}^{2}\,\! }[/math] values are displayed as R-sq and R-sq(adj), respectively. Other values displayed along with these values are S, PRESS and R-sq(pred). As explained in Simple Linear Regression Analysis, the value of S is the square root of the error mean square, [math]\displaystyle{ M{{S}_{E}}\,\! }[/math], and represents the "standard error of the model."

PRESS is an abbreviation for prediction error sum of squares. It is the error sum of squares calculated using the PRESS residuals in place of the residuals, [math]\displaystyle{ {{e}_{i}}\,\! }[/math], in the equation for the error sum of squares. The PRESS residual, [math]\displaystyle{ {{e}_{(i)}}\,\! }[/math], for a particular observation, [math]\displaystyle{ {{y}_{i}}\,\! }[/math], is obtained by fitting the regression model to the remaining observations. Then the value for a new observation, [math]\displaystyle{ {{\hat{y}}_{p}}\,\! }[/math], corresponding to the observation in question, [math]\displaystyle{ {{y}_{i}}\,\! }[/math], is obtained based on the new regression model. The difference between [math]\displaystyle{ {{y}_{i}}\,\! }[/math] and [math]\displaystyle{ {{\hat{y}}_{p}}\,\! }[/math] gives [math]\displaystyle{ {{e}_{(i)}}\,\! }[/math]. The PRESS residual, [math]\displaystyle{ {{e}_{(i)}}\,\! }[/math], can also be obtained using [math]\displaystyle{ {{h}_{ii}}\,\! }[/math], the diagonal element of the hat matrix, [math]\displaystyle{ H\,\! }[/math], as follows:


[math]\displaystyle{ {{e}_{(i)}}=\frac{{{e}_{i}}}{1-{{h}_{ii}}}\,\! }[/math]


R-sq(pred), also referred to as prediction [math]\displaystyle{ {{R}^{2}}\,\! }[/math], is obtained using PRESS as shown next:


[math]\displaystyle{ R_{pred}^{2}=1-\frac{PRESS}{S{{S}_{T}}}\,\! }[/math]


The values of R-sq, R-sq(adj) and S are indicators of how well the regression model fits the observed data. The values of PRESS and R-sq(pred) are indicators of how well the regression model predicts new observations. For example, higher values of PRESS or lower values of R-sq(pred) indicate a model that predicts poorly. The figure below shows these values for the data. The values indicate that the regression model fits the data well and also predicts well.

Coefficient of multiple determination and related results for the data.

Residual Analysis

Plots of residuals, [math]\displaystyle{ {{e}_{i}}\,\! }[/math], similar to the ones discussed in Simple Linear Regression Analysis for simple linear regression, are used to check the adequacy of a fitted multiple linear regression model. The residuals are expected to be normally distributed with a mean of zero and a constant variance of [math]\displaystyle{ {{\sigma }^{2}}\,\! }[/math]. In addition, they should not show any patterns or trends when plotted against any variable or in a time or run-order sequence. Residual plots may also be obtained using standardized and studentized residuals. Standardized residuals, [math]\displaystyle{ {{d}_{i}}\,\! }[/math], are obtained using the following equation:


[math]\displaystyle{ \begin{align} {{d}_{i}}&= & \frac{{{e}_{i}}}{\sqrt{{{{\hat{\sigma }}}^{2}}}} \\ & = & \frac{{{e}_{i}}}{\sqrt{M{{S}_{E}}}} \end{align}\,\! }[/math]


Standardized residuals are scaled so that the standard deviation of the residuals is approximately equal to one. This helps to identify possible outliers or unusual observations. However, standardized residuals may understate the true residual magnitude, hence studentized residuals, [math]\displaystyle{ {{r}_{i}}\,\! }[/math], are used in their place. Studentized residuals are calculated as follows:


[math]\displaystyle{ \begin{align} {{r}_{i}} & = & \frac{{{e}_{i}}}{\sqrt{{{{\hat{\sigma }}}^{2}}(1-{{h}_{ii}})}} \\ & = & \frac{{{e}_{i}}}{\sqrt{M{{S}_{E}}(1-{{h}_{ii}})}} \end{align} \,\! }[/math]


where [math]\displaystyle{ {{h}_{ii}}\,\! }[/math] is the [math]\displaystyle{ i\,\! }[/math] th diagonal element of the hat matrix, [math]\displaystyle{ H\,\! }[/math]. External studentized (or the studentized deleted) residuals may also be used. These residuals are based on the PRESS residuals mentioned in Coefficient of Multiple Determination, R2. The reason for using the external studentized residuals is that if the [math]\displaystyle{ i\,\! }[/math] th observation is an outlier, it may influence the fitted model. In this case, the residual [math]\displaystyle{ {{e}_{i}}\,\! }[/math] will be small and may not disclose that [math]\displaystyle{ i\,\! }[/math] th observation is an outlier. The external studentized residual for the [math]\displaystyle{ i\,\! }[/math] th observation, [math]\displaystyle{ {{t}_{i}}\,\! }[/math], is obtained as follows:


[math]\displaystyle{ {{t}_{i}}={{e}_{i}}{{\left[ \frac{n-k}{S{{S}_{E}}(1-{{h}_{ii}})-e_{i}^{2}} \right]}^{0.5}}\,\! }[/math]


Residual values for the data are shown in the figure below. Standardized residual plots for the data are shown in next two figures. The Weibull++ DOE folio compares the residual values to the critical values on the [math]\displaystyle{ t\,\! }[/math] distribution for studentized and external studentized residuals.


Residual values for the data.


Residual probability plot for the data.


For other residuals the normal distribution is used. For example, for the data, the critical values on the [math]\displaystyle{ t\,\! }[/math] distribution at a significance of 0.1 are [math]\displaystyle{ {{t}_{0.05,14}}=1.761\,\! }[/math] and [math]\displaystyle{ -{{t}_{0.05,14}}=-1.761\,\! }[/math] (as calculated in the example, Test on Individual Regression Coefficients (t Test)). The studentized residual values corresponding to the 3rd and 17th observations lie outside the critical values. Therefore, the 3rd and 17th observations are outliers. This can also be seen on the residual plots in the next two figures.

Residual versus fitted values plot for the data.


Residual versus run order plot for the data.

Outlying x Observations

Residuals help to identify outlying [math]\displaystyle{ y\,\! }[/math] observations. Outlying [math]\displaystyle{ x\,\! }[/math] observations can be detected using leverage. Leverage values are the diagonal elements of the hat matrix, [math]\displaystyle{ {{h}_{ii}}\,\! }[/math]. The [math]\displaystyle{ {{h}_{ii}}\,\! }[/math] values always lie between 0 and 1. Values of [math]\displaystyle{ {{h}_{ii}}\,\! }[/math] greater than [math]\displaystyle{ 2(k+1)/n\,\! }[/math] are considered to be indicators of outlying [math]\displaystyle{ x\,\! }[/math] observations.

Influential Observations Detection

Once an outlier is identified, it is important to determine if the outlier has a significant effect on the regression model. One measure to detect influential observations is Cook's distance measure which is computed as follows:


[math]\displaystyle{ {{D}_{i}}=\frac{r_{i}^{2}}{(k+1)}\left[ \frac{{{h}_{ii}}}{(1-{{h}_{ii}})} \right]\,\! }[/math]


To use Cook's distance measure, the [math]\displaystyle{ {{D}_{i}}\,\! }[/math] values are compared to percentile values on the [math]\displaystyle{ F\,\! }[/math] distribution with [math]\displaystyle{ (k+1,n-(k+1))\,\! }[/math] degrees of freedom. If the percentile value is less than 10 or 20 percent, then the [math]\displaystyle{ i\,\! }[/math] th case has little influence on the fitted values. However, if the percentile value is close to 50 percent or greater, the [math]\displaystyle{ i\,\! }[/math] th case is influential, and fitted values with and without the [math]\displaystyle{ i\,\! }[/math] th case will differ substantially.


Example

Cook's distance measure can be calculated as shown next. The distance measure is calculated for the first observation of the data. The remaining values along with the leverage values are shown in the figure below (displaying Leverage and Cook's distance measure for the data).


Leverage and Cook's distance measure for the data.


The standardized residual corresponding to the first observation is:


[math]\displaystyle{ \begin{align} {{r}_{1}} & = & \frac{{{e}_{1}}}{\sqrt{M{{S}_{E}}(1-{{h}_{11}})}} \\ & = & \frac{1.3127}{\sqrt{30.3(1-0.2755)}} \\ & = & 0.2804 \end{align}\,\! }[/math]


Cook's distance measure for the first observation can now be calculated as:


[math]\displaystyle{ \begin{align} {{D}_{1}} & = & \frac{r_{1}^{2}}{(k+1)}\left[ \frac{{{h}_{11}}}{(1-{{h}_{11}})} \right] \\ & = & \frac{{{0.2804}^{2}}}{(2+1)}\left[ \frac{0.2755}{(1-0.2755)} \right] \\ & = & 0.01 \end{align}\,\! }[/math]


The 50th percentile value for [math]\displaystyle{ {{F}_{3,14}}\,\! }[/math] is 0.83. Since all [math]\displaystyle{ {{D}_{i}}\,\! }[/math] values are less than this value there are no influential observations.

Lack-of-Fit Test

The lack-of-fit test for simple linear regression discussed in Simple Linear Regression Analysis may also be applied to multiple linear regression to check the appropriateness of the fitted response surface and see if a higher order model is required. Data for [math]\displaystyle{ m\,\! }[/math] replicates may be collected as follows for all [math]\displaystyle{ n\,\! }[/math] levels of the predictor variables:


[math]\displaystyle{ \begin{align} & & {{y}_{11}},{{y}_{12}},....,{{y}_{1m}}\text{ }m\text{ repeated observations at the first level } \\ & & {{y}_{21}},{{y}_{22}},....,{{y}_{2m}}\text{ }m\text{ repeated observations at the second level} \\ & & ... \\ & & {{y}_{i1}},{{y}_{i2}},....,{{y}_{im}}\text{ }m\text{ repeated observations at the }i\text{th level} \\ & & ... \\ & & {{y}_{n1}},{{y}_{n2}},....,{{y}_{nm}}\text{ }m\text{ repeated observations at the }n\text{th level } \end{align}\,\! }[/math]


The sum of squares due to pure error, [math]\displaystyle{ S{{S}_{PE}}\,\! }[/math], can be obtained as discussed in the Simple Linear Regression Analysis as:


[math]\displaystyle{ S{{S}_{PE}}=\underset{i=1}{\overset{n}{\mathop \sum }}\,\underset{j=1}{\overset{m}{\mathop \sum }}\,{{({{y}_{ij}}-{{\bar{y}}_{i}})}^{2}}\,\! }[/math]


The number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{PE}}\,\! }[/math] are:


[math]\displaystyle{ dof(S{S}_{PE}) = nm-n \,\! }[/math]


Knowing [math]\displaystyle{ S{{S}_{PE}}\,\! }[/math], sum of squares due to lack-of-fit, [math]\displaystyle{ S{{S}_{LOF}}\,\! }[/math], can be obtained as:


[math]\displaystyle{ S{{S}_{LOF}}=S{{S}_{E}}-S{{S}_{PE}}\,\! }[/math]


The number of degrees of freedom associated with [math]\displaystyle{ S{{S}_{LOF}}\,\! }[/math] are:


[math]\displaystyle{ \begin{align} dof(S{{S}_{LOF}}) & = & dof(S{{S}_{E}})-dof(S{{S}_{PE}}) \\ & = & n-(k+1)-(nm-n) \end{align} \,\! }[/math]


The test statistic for the lack-of-fit test is:


[math]\displaystyle{ \begin{align} {{F}_{0}} & = & \frac{S{{S}_{LOF}}/dof(S{{S}_{LOF}})}{S{{S}_{PE}}/dof(S{{S}_{PE}})} \\ & = & \frac{M{{S}_{LOF}}}{M{{S}_{PE}}} \end{align}\,\! }[/math]


Other Topics in Multiple Linear Regression

Polynomial Regression Models

Polynomial regression models are used when the response is curvilinear. The equation shown next presents a second order polynomial regression model with one predictor variable:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{11}}x_{1}^{2}+\epsilon\,\! }[/math]


Usually, coded values are used in these models. Values of the variables are coded by centering or expressing the levels of the variable as deviations from the mean value of the variable and then scaling or dividing the deviations obtained by half of the range of the variable.


[math]\displaystyle{ coded\text{ }value=\frac{actual\text{ }value-mean}{half\text{ }of\text{ }range}\,\! }[/math]


The reason for using coded predictor variables is that many times [math]\displaystyle{ x\,\! }[/math] and [math]\displaystyle{ {{x}^{2}}\,\! }[/math] are highly correlated and, if uncoded values are used, there may be computational difficulties while calculating the [math]\displaystyle{ {{({{X}^{\prime }}X)}^{-1}}\,\! }[/math] matrix to obtain the estimates, [math]\displaystyle{ \hat{\beta }\,\! }[/math], of the regression coefficients using the equation for the [math]\displaystyle{ F\,\! }[/math] distribution given in Statistics Background on DOE.

Qualitative Factors

The multiple linear regression model also supports the use of qualitative factors. For example, gender may need to be included as a factor in a regression model. One of the ways to include qualitative factors in a regression model is to employ indicator variables. Indicator variables take on values of 0 or 1. For example, an indicator variable may be used with a value of 1 to indicate female and a value of 0 to indicate male.


[math]\displaystyle{ {{x}_{1}}=\{\begin{array}{*{35}{l}} 1\text{ Female} \\ 0\text{ Male} \\ \end{array}\,\! }[/math]


In general ( [math]\displaystyle{ n-1\,\! }[/math] ) indicator variables are required to represent a qualitative factor with [math]\displaystyle{ n\,\! }[/math] levels. As an example, a qualitative factor representing three types of machines may be represented as follows using two indicator variables:


[math]\displaystyle{ \begin{align} {{x}_{1}} & = & 1,\text{ }{{x}_{2}} & = & 0\text{ Machine Type I} \\ {{x}_{1}} & = & 0,\text{ }{{x}_{2}} & = & 1\text{ Machine Type II} \\ {{x}_{1}} & = & 0,\text{ }{{x}_{2}} & = & 0\text{ Machine Type III} \end{align}\,\! }[/math]


An alternative coding scheme for this example is to use a value of -1 for all indicator variables when representing the last level of the factor:


[math]\displaystyle{ \begin{align} {{x}_{1}} & = & 1,\text{ }{{x}_{2}}& = &0\text{ Machine Type I} \\ {{x}_{1}}& = & 0,\text{ }{{x}_{2}}& = &1\text{ Machine Type II} \\ {{x}_{1}}& = & -1,\text{ }{{x}_{2}}& = &-1\text{ Machine Type III} \end{align}\,\! }[/math]


Indicator variables are also referred to as dummy variables or binary variables.

Example

Consider data from two types of reactors of a chemical process shown where the yield values are recorded for various levels of factor [math]\displaystyle{ {{x}_{1}}\,\! }[/math]. Assuming there are no interactions between the reactor type and [math]\displaystyle{ {{x}_{1}}\,\! }[/math], a regression model can be fitted to this data as shown next.

Since the reactor type is a qualitative factor with two levels, it can be represented by using one indicator variable. Let [math]\displaystyle{ {{x}_{2}}\,\! }[/math] be the indicator variable representing the reactor type, with 0 representing the first type of reactor and 1 representing the second type of reactor.


[math]\displaystyle{ {{x}_{2}}=\{\begin{array}{*{35}{l}} 0\text{ Reactor Type I} \\ 1\text{ Reactor Type II} \\ \end{array}\,\! }[/math]


Yield data from the two types of reactors for a chemical process.


Data entry in the DOE folio for this example is shown in the figure after the table below. The regression model for this data is:


[math]\displaystyle{ Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\epsilon\,\! }[/math]


The [math]\displaystyle{ X\,\! }[/math] and [math]\displaystyle{ y\,\! }[/math] matrices for the given data are:


Data from the table above as entered in Weibull++.


The estimated regression coefficients for the model can be obtained as:


[math]\displaystyle{ \begin{align} \hat{\beta }& = & {{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y \\ & = & \left[ \begin{matrix} 153.7 \\ 2.4 \\ -27.5 \\ \end{matrix} \right] \end{align}\,\! }[/math]


Therefore, the fitted regression model is:


[math]\displaystyle{ \hat{y}=153.7+2.4{{x}_{1}}-27.5{{x}_{2}}\,\! }[/math]


Note that since [math]\displaystyle{ {{x}_{2}}\,\! }[/math] represents a qualitative predictor variable, the fitted regression model cannot be plotted simultaneously against [math]\displaystyle{ {{x}_{1}}\,\! }[/math] and [math]\displaystyle{ {{x}_{2}}\,\! }[/math] in a two-dimensional space (because the resulting surface plot will be meaningless for the dimension in [math]\displaystyle{ {{x}_{2}}\,\! }[/math] ). To illustrate this, a scatter plot of the data against [math]\displaystyle{ {{x}_{2}}\,\! }[/math] is shown in the following figure.


Scatter plot of the observed yield values against [math]\displaystyle{ x_2\,\! }[/math] (reactor type)


It can be noted that, in the case of qualitative factors, the nature of the relationship between the response (yield) and the qualitative factor (reactor type) cannot be categorized as linear, or quadratic, or cubic, etc. The only conclusion that can be arrived at for these factors is to see if these factors contribute significantly to the regression model. This can be done by employing the partial [math]\displaystyle{ F\,\! }[/math] test discussed in Multiple Linear Regression Analysis (using the extra sum of squares of the indicator variables representing these factors). The results of the test for the present example are shown in the ANOVA table. The results show that [math]\displaystyle{ {{x}_{2}}\,\! }[/math] (reactor type) contributes significantly to the fitted regression model.


DOE results for the data.

Multicollinearity

At times the predictor variables included in a multiple linear regression model may be found to be dependent on each other. Multicollinearity is said to exist in a multiple regression model with strong dependencies between the predictor variables. Multicollinearity affects the regression coefficients and the extra sum of squares of the predictor variables. In a model with multicollinearity the estimate of the regression coefficient of a predictor variable depends on what other predictor variables are included the model. The dependence may even lead to change in the sign of the regression coefficient. In a such models, an estimated regression coefficient may not be found to be significant individually (when using the [math]\displaystyle{ t\,\! }[/math] test on the individual coefficient or looking at the [math]\displaystyle{ p\,\! }[/math] value) even though a statistical relation is found to exist between the response variable and the set of the predictor variables (when using the [math]\displaystyle{ F\,\! }[/math] test for the set of predictor variables). Therefore, you should be careful while looking at individual predictor variables in models that have multicollinearity. Care should also be taken while looking at the extra sum of squares for a predictor variable that is correlated with other variables. This is because in models with multicollinearity the extra sum of squares is not unique and depends on the other predictor variables included in the model.


Multicollinearity can be detected using the variance inflation factor (abbreviated [math]\displaystyle{ VIF\,\! }[/math] ). [math]\displaystyle{ VIF\,\! }[/math] for a coefficient [math]\displaystyle{ {{\beta }_{j}}\,\! }[/math] is defined as:


[math]\displaystyle{ VIF=\frac{1}{(1-R_{j}^{2})}\,\! }[/math]


where [math]\displaystyle{ R_{j}^{2}\,\! }[/math] is the coefficient of multiple determination resulting from regressing the [math]\displaystyle{ j\,\! }[/math] th predictor variable, [math]\displaystyle{ {{x}_{j}}\,\! }[/math], on the remaining [math]\displaystyle{ k\,\! }[/math] -1 predictor variables. Mean values of [math]\displaystyle{ VIF\,\! }[/math] considerably greater than 1 indicate multicollinearity problems. A few methods of dealing with multicollinearity include increasing the number of observations in a way designed to break up dependencies among predictor variables, combining the linearly dependent predictor variables into one variable, eliminating variables from the model that are unimportant or using coded variables.

Example

Variance inflation factors can be obtained for the data below.

Observed yield data for various levels of two factors.

To calculate the variance inflation factor for [math]\displaystyle{ {{x}_{1}}\,\! }[/math], [math]\displaystyle{ R_{1}^{2}\,\! }[/math] has to be calculated.[math]\displaystyle{ R_{1}^{2}\,\! }[/math] is the coefficient of determination for the model when [math]\displaystyle{ {{x}_{1}}\,\! }[/math] is regressed on the remaining variables. In the case of this example there is just one remaining variable which is [math]\displaystyle{ {{x}_{2}}\,\! }[/math]. If a regression model is fit to the data, taking [math]\displaystyle{ {{x}_{1}}\,\! }[/math] as the response variable and [math]\displaystyle{ {{x}_{2}}\,\! }[/math] as the predictor variable, then the design matrix and the vector of observations are:


[math]\displaystyle{ {{X}_{{{R}_{1}}}}=\left[ \begin{matrix} 1 & 29.1 \\ 1 & 29.3 \\ . & . \\ . & . \\ . & . \\ 1 & 32.9 \\ \end{matrix} \right]\text{ }{{y}_{{{R}_{1}}}}=\left[ \begin{matrix} 41.9 \\ 43.4 \\ . \\ . \\ . \\ 77.8 \\ \end{matrix} \right]\,\! }[/math]


The regression sum of squares for this model can be obtained as:


[math]\displaystyle{ \begin{align} S{{S}_{R}}= & y_{{{R}_{1}}}^{\prime }\left[ {{H}_{{{R}_{1}}}}-(\frac{1}{n})J \right]{{y}_{{{R}_{1}}}} \\ = & 1988.6 \end{align}\,\! }[/math]


where [math]\displaystyle{ {{H}_{{{R}_{1}}}}\,\! }[/math] is the hat matrix (and is calculated using [math]\displaystyle{ {{H}_{{{R}_{1}}}}={{X}_{{{R}_{1}}}}{{(X_{{{R}_{1}}}^{\prime }{{X}_{{{R}_{1}}}})}^{-1}}X_{{{R}_{1}}}^{\prime }\,\! }[/math] ) and [math]\displaystyle{ J\,\! }[/math] is the matrix of ones. The total sum of squares for the model can be calculated as:


[math]\displaystyle{ \begin{align} S{{S}_{T}}= & {{y}^{\prime }}\left[ I-(\frac{1}{n})J \right]y \\ = & 2182.9 \end{align}\,\! }[/math]


where [math]\displaystyle{ I\,\! }[/math] is the identity matrix. Therefore:


[math]\displaystyle{ \begin{align} R_{1}^{2}= & \frac{S{{S}_{R}}}{S{{S}_{T}}} \\ = & \frac{1988.6}{2182.9} \\ = & 0.911 \end{align}\,\! }[/math]


Then the variance inflation factor for [math]\displaystyle{ {{x}_{1}}\,\! }[/math] is:


[math]\displaystyle{ \begin{align} VI{{F}_{1}}= & \frac{1}{(1-R_{1}^{2})} \\ = & \frac{1}{1-0.911} \\ = & 11.2 \end{align}\,\! }[/math]


The variance inflation factor for [math]\displaystyle{ {{x}_{2}}\,\! }[/math], [math]\displaystyle{ VI{{F}_{2}}\,\! }[/math], can be obtained in a similar manner. In the DOE folios, the variance inflation factors are displayed in the VIF column of the Regression Information table as shown in the following figure. Since the values of the variance inflation factors obtained are considerably greater than 1, multicollinearity is an issue for the data.


Variance inflation factors for the data in.