Template:Normal distribution estimation of the parameters

Rank Regression on Y
Performing rank regression on Y requires that a straight line be fitted to a set of data points such that the sum of the squares of the vertical deviations from the points to the line is minimized.

The least squares parameter estimation method (regression analysis) was discussed in Chapter 3 and the following equations for regression on Y were derived:


 * $$\begin{align}\hat{a}= & \bar{b}-\hat{b}\bar{x} \\

=& \frac{\sum_{i=1}^N y_{i}}{N}-\hat{b}\frac{\sum_{i=1}^{N}x_{i}}{N}\\ \end{align} $$


 * and:


 * $$\hat{b}=\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{x}_{i}}{{y}_{i}}-\tfrac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{x}_{i}}\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{y}_{i}}}{N}}{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,x_{i}^{2}-\tfrac{N}}$$

In the case of the normal distribution, the equations for $${{y}_{i}}$$  and  $${{x}_{i}}$$  are:


 * $${{y}_{i}}={{\Phi }^{-1}}\left[ F({{T}_{i}}) \right]$$


 * and:


 * $${{x}_{i}}={{T}_{i}}$$

where the values for $$F({{T}_{i}})$$  are estimated from the median ranks. Once $$\widehat{a}$$  and  $$\widehat{b}$$  are obtained,  $$\widehat{\sigma }$$  and  $$\widehat{\mu }$$  can easily be obtained from Eqns. (an) and (bn).

The Correlation Coefficient
The estimator of the sample correlation coefficient, $$\hat{\rho }$$, is given by:


 * $$\hat{\rho }=\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,({{x}_{i}}-\overline{x})({{y}_{i}}-\overline{y})}{\sqrt{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{({{x}_{i}}-\overline{x})}^{2}}\cdot \underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{({{y}_{i}}-\overline{y})}^{2}}}}$$

Example 2
Fourteen units were reliability tested and the following life test data were obtained:

Assuming the data follow a normal distribution, estimate the parameters and determine the correlation coefficient, $$\rho $$, using rank regression on Y.

Solution to Example 2
Construct a table like the one shown next.

$$\overset – {\mathop{\text{Table 8}\text{.2 - Least Squares Analysis}}}\,$$ $$\begin{matrix} \text{N} & \text{T}_{i} & \text{F(T}_{i}\text{)} & \text{y}_{i} & \text{T}_{i}^{2} & \text{y}_{i}^{2} & \text{T}_{i}\text{ y}_{i} \\ \text{1} & \text{5} & \text{0}\text{.0483} & \text{-1}\text{.6619} & \text{25} & \text{2}\text{.7619} & \text{-8}\text{.3095} \\ \text{2} & \text{10} & \text{0}\text{.1170} & \text{-1}\text{.1901} & \text{100} & \text{1}\text{.4163} & \text{-11}\text{.9010} \\ \text{3} & \text{15} & \text{0}\text{.1865} & \text{-0}\text{.8908} & \text{225} & \text{0}\text{.7935} & \text{-13}\text{.3620} \\ \text{4} & \text{20} & \text{0}\text{.2561} & \text{-0}\text{.6552} & \text{400} & \text{0}\text{.4292} & \text{-13}\text{.1030} \\ \text{5} & \text{25} & \text{0}\text{.3258} & \text{-0}\text{.4512} & \text{625} & \text{0}\text{.2036} & \text{-11}\text{.2800} \\ \text{6} & \text{30} & \text{0}\text{.3954} & \text{-0}\text{.2647} & \text{900} & \text{0}\text{.0701} & \text{-7}\text{.9422} \\ \text{7} & \text{35} & \text{0}\text{.4651} & \text{-0}\text{.0873} & \text{1225} & \text{0}\text{.0076} & \text{-3}\text{.0542} \\ \text{8} & \text{40} & \text{0}\text{.5349} & \text{0}\text{.0873} & \text{1600} & \text{0}\text{.0076} & \text{3}\text{.4905} \\ \text{9} & \text{50} & \text{0}\text{.6046} & \text{0}\text{.2647} & \text{2500} & \text{0}\text{.0701} & \text{13}\text{.2370} \\ \text{10} & \text{60} & \text{0}\text{.6742} & \text{0}\text{.4512} & \text{3600} & \text{0}\text{.2036} & \text{27}\text{.0720} \\ \text{11} & \text{70} & \text{0}\text{.7439} & \text{0}\text{.6552} & \text{4900} & \text{0}\text{.4292} & \text{45}\text{.8605} \\ \text{12} & \text{80} & \text{0}\text{.8135} & \text{0}\text{.8908} & \text{6400} & \text{0}\text{.7935} & \text{71}\text{.2640} \\ \text{13} & \text{90} & \text{0}\text{.8830} & \text{1}\text{.1901} & \text{8100} & \text{1}\text{.4163} & \text{107}\text{.1090} \\ \text{14} & \text{100} & \text{0}\text{.9517} & \text{1}\text{.6619} & \text{10000} & \text{2}\text{.7619} & \text{166}\text{.1900} \\ \mathop{}_{}^{} & \text{630} & {} & \text{0} & \text{40600} & \text{11}\text{.3646} & \text{365}\text{.2711} \\ \end{matrix}$$


 * •	The median rank values ( $$F({{T}_{i}})$$ ) can be found in rank tables, available in many statistical texts, or they can be estimated by using the Quick Statistical Reference in Weibull++.
 * •	The $${{y}_{i}}$$  values were obtained from standardized normal distribution's area tables by entering for  $$F(z)$$  and getting the corresponding  $$z$$  value ( $${{y}_{i}}$$ ).  As with the median rank values, these standard normal values can be obtained with the Quick Statistical Reference.

Given the values in Table 8.2, calculate $$\widehat{a}$$  and  $$\widehat{b}$$  using Eqns. (aan) and (bbn):


 * $$\begin{align}

& \widehat{b}= & \frac{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}}{{y}_{i}}-(\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}})(\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{y}_{i}})/14}{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,T_{i}^{2}-{{(\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}})}^{2}}/14} \\ & &  \\  & \widehat{b}= & \frac{365.2711-(630)(0)/14}{40,600-{{(630)}^{2}}/14}=0.02982 \end{align}$$


 * and:


 * $$\widehat{a}=\overline{y}-\widehat{b}\overline{T}=\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{y}_{i}}}{N}-\widehat{b}\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{T}_{i}}}{N}$$


 * or:


 * $$\widehat{a}=\frac{0}{14}-(0.02982)\frac{630}{14}=-1.3419$$

Therefore, from Eqn. (bn):


 * $$\widehat{\sigma}=\frac{1}{\hat{b}}=\frac{1}{0.02982}=33.5367$$


 * and from Eqn. (an):


 * $$\widehat{\mu }=-\widehat{a}\cdot \widehat{\sigma }=-(-1.3419)\cdot 33.5367\simeq 45$$

or $$\widehat{\mu }=45$$  hours $$.$$

The correlation coefficient can be estimated using Eqn. (RHOn):


 * $$\widehat{\rho }=0.979$$

The preceding example can be repeated using Weibull++.


 * •	Create a new folio for Times-to-Failure data, and enter the data given in Table 8.1.
 * •	Choose Normal from the Distributions list.
 * •	Go to the Analysis page and select Rank Regression on Y (RRY).
 * •	Click the Calculate icon located on the Main page.



The probability plot is shown next.



Rank Regression on X
As was mentioned previously, performing a rank regression on X requires that a straight line be fitted to a set of data points such that the sum of the squares of the horizontal deviations from the points to the fitted line is minimized.

Again, the first task is to bring our function, Eqn. (Fnorm), into a linear form. This step is exactly the same as in regression on Y analysis and Eqns. (norm), (yn), (an), and (bn) apply in this case as they did for the regression on Y. The deviation from the previous analysis begins on the least squares fit step where: in this case, we treat $$x$$  as the dependent variable and  $$y$$  as the independent variable. The best-fitting straight line for the data, for regression on X, is the straight line:


 * $$x=\widehat{a}+\widehat{b}y$$

The corresponding equations for $$\widehat{a}$$  and  $$\widehat{b}$$  are:


 * $$\hat{a}=\overline{x}-\hat{b}\overline{y}=\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{x}_{i}}}{N}-\hat{b}\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{y}_{i}}}{N}$$


 * and:


 * $$\hat{b}=\frac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{x}_{i}}{{y}_{i}}-\tfrac{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{x}_{i}}\underset{i=1}{\overset{N}{\mathop{\sum }}}\,{{y}_{i}}}{N}}{\underset{i=1}{\overset{N}{\mathop{\sum }}}\,y_{i}^{2}-\tfrac{N}}$$


 * where:


 * $${{y}_{i}}={{\Phi }^{-1}}\left[ F({{T}_{i}}) \right]$$


 * and:


 * $${{x}_{i}}={{T}_{i}}$$

and the $$F({{T}_{i}})$$  values are estimated from the median ranks. Once $$\widehat{a}$$  and  $$\widehat{b}$$  are obtained, solve Eqn. (xlinen) for the unknown value of $$y$$  which corresponds to:


 * $$y=-\frac{\widehat{a}}{\widehat{b}}+\frac{1}{\widehat{b}}x$$

Solving for the parameters from Eqns. (an) and (bn), we get:


 * $$a=-\frac{\widehat{a}}{\widehat{b}}=-\frac{\mu }{\sigma }\Rightarrow \mu =\widehat{a}$$


 * and:


 * $$b=\frac{1}{\widehat{b}}=\frac{1}{\sigma }\Rightarrow \sigma =\widehat{b}$$

The correlation coefficient is evaluated as before using Eqn. (RHOn).

Example 3
Using the data of Example 2 and assuming a normal distribution, estimate the parameters and determine the correlation coefficient, $$\rho $$, using rank regression on X.

Solution to Example 3
Table 8.2 constructed in Example 2 applies to this example also. Using the values on this table, we get:


 * $$\begin{align}

\hat{b}= & \frac{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}}{{y}_{i}}-\tfrac{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}}\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{y}_{i}}}{14}}{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,y_{i}^{2}-\tfrac{14}} \\ \widehat{b}= & \frac{365.2711-(630)(0)/14}{11.3646-{{(0)}^{2}}/14}=32.1411 \end{align}$$


 * and:


 * $$\hat{a}=\overline{x}-\hat{b}\overline{y}=\frac{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{T}_{i}}}{14}-\widehat{b}\frac{\underset{i=1}{\overset{14}{\mathop{\sum }}}\,{{y}_{i}}}{14}$$


 * or:


 * $$\widehat{a}=\frac{630}{14}-(32.1411)\frac{(0)}{14}=45$$

Therefore, from Eqn. (bnx):


 * $$\widehat{\sigma }=\widehat{b}=32.1411$$


 * and from Eqn. (anx):


 * $$\widehat{\mu }=\widehat{a}=45\text{ hours}$$

The correlation coefficient is found using Eqn. (RHOn):


 * $$\widehat{\rho }=0.979$$

Note that the results for regression on X are not necessarily the same as the results for regression on Y. The only time when the two regressions are the same (i.e. will yield the same equation for a line) is when the data lie perfectly on a straight line. Using Weibull++, Rank Regression on X (RRX) can be selected from the Analysis page.



The plot of the solution for this example is shown next.

$$$$

Maximum Likelihood Estimation
As it was outlined in Chapter 3, maximum likelihood estimation works by developing a likelihood function based on the available data and finding the values of the parameter estimates that maximize the likelihood function. This can be achieved by using iterative methods to determine the parameter estimate values that maximize the likelihood function. This can be rather difficult and time-consuming, particularly when dealing with the three-parameter distribution. Another method of finding the parameter estimates involves taking the partial derivatives of the likelihood function with respect to the parameters, setting the resulting equations equal to zero, and solving simultaneously to determine the values of the parameter estimates. The log-likelihood functions and associated partial derivatives used to determine maximum likelihood estimates for the normal distribution are covered in Appendix C.

Special Note About Bias
Estimators (i.e. parameter estimates) have properties such as unbiasedness, minimum variance, sufficiency, consistency, squared error constancy, efficiency and completeness [7][5]. Numerous books and papers deal with these properties and this coverage is beyond the scope of this reference.

However, we would like to briefly address one of these properties, unbiasedness. An estimator is said to be unbiased if the estimator $$\widehat{\theta }=d({{X}_{1,}}{{X}_{2,}}...,{{X}_{n)}}$$  satisfies the condition  $$E\left[ \widehat{\theta } \right]$$   $$=\theta $$  for all  $$\theta \in \Omega .$$ Note that $$E\left[ X \right]$$  denotes the expected value of X and is defined (for continuous distributions) by:


 * $$\begin{align}

E\left[ X \right]= \int_{\varpi }x\cdot f(x)dx \\ X\in & \varpi. \end{align}$$

It can be shown [7][5] that the MLE estimator for the mean of the normal (and lognormal) distribution does satisfy the unbiasedness criteria, or $$E\left[ \widehat{\mu } \right]$$   $$=\mu .$$  The same is not true for the estimate of the variance  $$\hat{\sigma }_{T}^{2}$$. The maximum likelihood estimate for the variance for the normal distribution is given by:


 * $$\hat{\sigma }_{T}^{2}=\frac{1}{N}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}}$$

with a standard deviation of:


 * $${{\hat{\sigma }}_{T}}=\sqrt{\frac{1}{N}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}}}$$

These estimates, however, have been shown to be biased. It can be shown [7][5] that the unbiased estimate of the variance and standard deviation for complete data is given by:


 * $$\begin{align}

\hat{\sigma }_{T}^{2}= & \left[ \frac{N}{N-1} \right]\cdot \left[ \frac{1}{N}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}} \right]=\frac{1}{N-1}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}} \\ {{{\hat{\sigma }}}_{T}}= & \sqrt{\left[ \frac{N}{N-1} \right]\cdot \left[ \frac{1}{N}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}} \right]} \\ = & \sqrt{\frac{1}{N-1}\underset{i=1}{\overset{N}{\mathop \sum }}\,{{({{T}_{i}}-\bar{T})}^{2}}} \end{align}$$

Note that for larger values of $$N$$,  $$\sqrt{\left[ N/(N-1) \right]}$$  tends to 1.

Weibull++ by default returns the standard deviation as defined by Eqn. (NormSt2). The Use Unbiased Std on Normal Data option in the User Setup under the Calculations tab allows biasing to be considered when estimating the parameters.

When this option is selected, Weibull++ returns the standard deviation as defined by Eqn. (NormSt2). This is only true for complete data sets. For all other data types, Weibull++ by default returns the standard deviation as defined by Eqn. (normbias2) regardless of the selection status of this option. The next figure shows this setting in Weibull++.

$$$$