Template:MLE Parameter Estimation

MLE (Maximum Likelihood) Parameter Estimation for Complete Data
From a statistical point of view, the method of maximum likelihood estimation  is, with some exceptions, considered to be the most robust of the parameter estimation techniques discussed here. This method is presented in this section for complete data, that is, data consisting only of single times-to-failure.

Background on Theory
The basic idea behind MLE is to obtain the most likely values of the parameters, for a given distribution, that will best describe the data. As an example, consider the following data (-3, 0, 4) and assume that you are trying to estimate the mean of the data. Now, if you have to choose the most likely value for the mean from -5, 1 and 10, which one would you choose? In this case, the most likely value is 1 (given your limit on choices). Similarly, under MLE, one determines the most likely values for the parameters of the assumed distribution.

It is mathematically formulated as follows:

If $$x$$ is a continuous random variable with $$pdf:$$


 * $$f(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}})$$

where $${\theta _1},{\theta _2},...,{\theta _k}$$ are $$k$$ unknown parameters which need to be estimated, with  independent observations, , which correspond in the case of life data analysis to failure times. The likelihood function is given by:


 * $$L({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}|{{x}_{1}},{{x}_{2}},...,{{x}_{R}})=L=\underset{i=1}{\overset{R}{\mathop \prod }}\,f({{x}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}})

$$


 * $$i=1,2,...,R$$

The logarithmic likelihood function is given by:

undefined
 * $$\Lambda = \ln L =\sum_{i = 1}^R \ln f({x_i};{\theta _1},{\theta _2},...,{\theta _k})

$$

The maximum likelihood estimators (or parameter values) of $${{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}},$$ are obtained by maximizing $$L$$ or $$\Lambda .$$

By maximizing $$\Lambda ,$$ which is much easier to work with than $$L$$, the maximum likelihood estimators (MLE) of $${{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}$$ are the simultaneous solutions of $$k$$ equations such that:


 * $$\frac{\partial{\Lambda}}{\partial{\theta_j}}=0, \text{ j=1,2...,k}

$$

Even though it is common practice to plot the MLE solutions using median ranks (points are plotted according to median ranks and the line according to the MLE solutions), this is not completely representative. As can be seen from the equations above, the MLE method is independent of any kind of ranks. For this reason, the MLE solution often appears not to track the data on the probability plot. This is perfectly acceptable since the two methods are independent of each other, and in no way suggests that the solution is wrong.

Comments on the MLE Method
The MLE method has many large sample properties that make it attractive for use. It is asymptotically consistent, which means that as the sample size gets larger, the estimates converge to the right values. It is asymptotically efficient, which means that for large samples, it produces the most precise estimates. It is asymptotically unbiased, which means that for large samples one expects to get the right value on average. The distribution of the estimates themselves is normal, if the sample is large enough, and this is the basis for the usual Fisher Matrix confidence bounds discussed later. These are all excellent large sample properties.

Unfortunately, the size of the sample necessary to achieve these properties can be quite large: thirty to fifty to more than a hundred exact failure times, depending on the application. With fewer points, the methods can be badly biased. It is known, for example, that MLE estimates of the shape parameter for the Weibull distribution are badly biased for small sample sizes, and the effect can be increased depending on the amount of censoring. This bias can cause major discrepancies in analysis. There are also pathological situations when the asymptotic properties of the MLE do not apply. One of these is estimating the location parameter for the three-parameter Weibull distribution when the shape parameter has a value close to 1. These problems, too, can cause major discrepancies.

However, MLE can handle suspensions and interval data better than rank regression, particularly when dealing with a heavily censored data set with few exact failure times or when the censoring times are unevenly distributed. It can also provide estimates with one or no observed failures, which rank regression cannot do. As a rule of thumb, our recommendation is to use rank regression techniques when the sample sizes are small and without heavy censoring (censoring is discussed in Chapter 4). When heavy or uneven censoring is present, when a high proportion of interval data is present and/or when the sample size is sufficient, MLE should be preferred.

MLE Analysis of Right Censored Data
When performing maximum likelihood analysis on data with suspended items, the likelihood function needs to be expanded to take into account the suspended items. The overall estimation technique does not change, but another term is added to the likelihood function to account for the suspended items. Beyond that, the method of solving for the parameter estimates remains the same. For example, consider a distribution where $$x$$ is a continuous random variable with $$pdf$$ and $$cdf$$:


 * $$\begin{align}

& f(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) \\ & F(x;{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) \end{align} $$

where $${{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}$$ are the  unknown parameters which need to be estimated from  $$R$$  observed failures at $${{T}_{1}},{{T}_{2}}...{{T}_{R}},$$ and $$M$$ observed suspensions at $${{S}_{1}},{{S}_{2}}$$ ... $${{S}_{M}},$$ then the likelihood function is formulated as follows:


 * $$\begin{align}

L({{\theta }_{1}},...,{{\theta }_{k}}|{{T}_{1}},...,{{T}_{R,}}{{S}_{1}},...,{{S}_{M}})= & \underset{i=1}{\overset{R}{\mathop \prod }}\,f({{T}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) \\ & \cdot \underset{j=1}{\overset{M}{\mathop \prod }}\,[1-F({{S}_{j}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}})] \end{align}$$

The parameters are solved by maximizing this equation, as described in Chapter 3. In most cases, no closed-form solution exists for this maximum or for the parameters. Solutions specific to each distribution utilizing MLE are presented in Appendix C.

MLE Analysis of Interval and Left Censored Data
The inclusion of left and interval censored data in an MLE solution for parameter estimates involves adding a term to the likelihood equation to account for the data types in question. When using interval data, it is assumed that the failures occurred in an interval,  i.e. in the interval from time $$A$$ to time $$B$$ (or from time $$0$$ to time $$B$$ if left censored), where $$A<B$$. In the case of interval data, and given $$P$$ interval observations, the likelihood function is modified by multiplying the likelihood function with an additional term as follows:


 * $$\begin{align}

L({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}|{{x}_{1}},{{x}_{2}},...,{{x}_{P}})= & \underset{i=1}{\overset{P}{\mathop \prod }}\,\{F({{x}_{i}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}}) \\ & \ \ -F({{x}_{i-1}};{{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{k}})\} \end{align} $$

Note that if only interval data are present, this term will represent the entire likelihood function for the MLE solution. The next section gives a formulation of the complete likelihood function for all possible censoring schemes.

The Complete Likelihood Function
We have now seen that obtaining MLE parameter estimates for different types of data involves incorporating different terms in the likelihood function to account for complete data, right censored data, and left, interval censored data. After including the terms for the different types of data, the likelihood function can now be expressed in its complete form or,


 * $$\begin{array}{*{35}{l}}

L= & \underset{i=1}{\mathop{\overset{R}{\mathop{\prod }}\,}}\,f({{T}_{i}};{{\theta }_{1}},...,{{\theta }_{k}})\cdot \underset{j=1}{\mathop{\overset{M}{\mathop{\prod }}\,}}\,[1-F({{S}_{j}};{{\theta }_{1}},...,{{\theta }_{k}})] \\ & \cdot \underset{l=1}{\mathop{\overset{P}{\mathop{\prod }}\,}}\,\left\{ F({{I}_};{{\theta }_{1}},...,{{\theta }_{k}})-F({{I}_};{{\theta }_{1}},...,{{\theta }_{k}}) \right\} \\ \end{array} $$


 * where


 * $$ L\to L({{\theta }_{1}},...,{{\theta }_{k}}|{{T}_{1}},...,{{T}_{R}},{{S}_{1}},...,{{S}_{M}},{{I}_{1}},...{{I}_{P}}),$$


 * and
 * $$R$$ is the number of units with exact failures
 * $$M$$ is the number of suspended units
 * $$P$$ is the number of units with left censored or interval times-to-failure
 * $${{\theta }_{k}}$$ are the parameters of the distribution
 * $${{T}_{i}}$$ is the $${{i}^{th}}$$ time to failure
 * $${{S}_{j}}$$ is the $${{j}^{th}}$$ time of suspension
 * $${{I}_}$$ is the ending of the time interval of the $${{l}^{th}}$$ group
 * $${{I}_}$$ is the beginning of the time interval of the $${l^{th}}$$ group

The total number of units is $$N=R+M+P$$. It should be noted that in this formulation if either $$R$$, $$M$$ or $$P$$ is zero the product term associated with them is assumed to be one and not zero.

See also
 * More details on Maximum Likelihood Parameter Estimation
 * Discussion on using grouped data with MLE at Grouped Data Parameter Estimation.