Probability Plotting: Difference between revisions

From ReliaWiki
Jump to navigation Jump to search
(this page is marked for deletion)
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
One method of calculating the parameter of the exponential distribution is by using probability plotting. To better illustrate this procedure, consider the following example.
[[Category:For Deletion]]
 
<br>
 
===Probability Plotting===
The least mathematically intensive method for parameter estimation is the method of probability plotting. As the term implies, probability plotting involves a physical plot of the data on specially constructed '''probability plotting paper'''. This method is easily implemented by hand, given that one can obtain the appropriate probability plotting paper.
====Illustrating the method for the 2-parameter Weibull distribution====
The method of probability plotting takes the ''cdf'' of the distribution and attempts to linearize it by employing a specially constructed paper. This is best illustrated using the 2-parameter Weibull distribution. 
 
In the case of the two-parameter Weibull distribution, the ''cdf'' (also the unreliability ''Q(T)'') is given by:
::<math>F(T)=Q(T)=1-{e^{-\left(\tfrac{T}{\eta}\right)^{\beta}}}</math>
 
=====Linearizing the Weibull unreliability function=====
This function can then be linearized (''i.e.'' put in the common form of <math>y=mx+b</math> format) as follows:
::<math>\begin{align}
Q(T)= &  1-{e^{-\left(\tfrac{T}{\eta}\right)^{\beta}}}  \\
  \ln (1-Q(T))= & \ln \left[ {e^{-\left(\tfrac{T}{\eta}\right)^{\beta}}} \right]  \\
  \ln (1-Q(T))=& -\left(\tfrac{T}{\eta}\right)^{\beta}  \\
  \ln ( -\ln (1-Q(T)))= & \beta \left(\ln \left( \frac{T}{\eta }\right)\right) \\
  \ln \left( \ln \left( \frac{1}{1-Q(T)}\right) \right) = & \beta\ln{ T} -\beta(\eta )  \\
\end{align}</math>
 
 
 
:Then by setting
::<math>y=\ln \left( \ln \left( \frac{1}{1-Q(T)} \right) \right)</math>
:and
::<math>x=\ln \left( T \right)</math>
the equation can then be rewritten as,
::<math>y=\beta x-\beta \ln \left( \eta  \right)</math>
which is now a linear equation with a slope of
::Slope<math>=m=\beta </math>
and an intercept of
::Intercept<math>=b=\beta \cdot ln(\eta)</math>.
<br>
 
=====Constructing the paper=====
The next task is to construct the Weibull probability plotting paper with the appropriate ''y'' - and ''x''-axes. The ''x-axis'' trasnformation is simply logarithmic. The ''y-axis'', is a bit more complex requiring a double log reciprocal transformation, or ,
 
 
::<math>y=\ln \left(\ln \left( \frac{1}{1-Q(T)} ) \right) \right)</math>
 
 
where ''Q(T)'' is the unreliability. 
 
 
Such papers have been created by different vendors and are called '' probability plotting papers''.  [http://http://www.weibull.com/ Weibull.com] has different plotting papers available for [http://www.weibull.com/GPaper/index.htm download].   
 
[[Image:WeibullPaper2C.png|thumb|center|300px|Weibull probability plotting paper example.]]
 
To illustrate, consider the following probability plot on a slightly different type of Weibull probability paper.
 
[[Image:ldachp3fig1.gif|thumb|center|300px|Probability plot on the Weibull probability plotting paper.]]
 
This paper is constructed based on the mentioned ''y'' - and ''x''-transformations, where the y-axis represents unreliability and the x-axis represents time. Both of these values must be known for each time-to-failure point we want to plot.
 
Then, given the ''y'' and ''x'' value for each point, the points can easily be put on the plot. Once the points have been placed on the plot, the best possible straight line is drawn through these points. Once the line has been drawn, the slope of the line can be obtained (some probability papers include a slope indicator to simplify this calculation). This is the parameter <math>\beta,</math> which is the value of the slope.
To determine the scale parameter, <math>\eta </math> (also called the ''characteristic life''), one reads the time (from the ''x-axis'' corresponding to ''Q(T)=63.2%''.
 
Note that from before at
::<math>\begin{align}
  Q(T=\eta)= & 1-{{e}^{-{{\left( \tfrac{\eta }{\eta } \right)}^{\beta }}}} \\
  = & 1-{{e}^{-1}} \\
  = & 0.632 \\
  = & 63.2% 
\end{align}</math>
 
Thus, if we enter the ''y'' axis at ''Q(T)=63.2%,'' the corresponding value of ''T'' will be equal to <math>\eta.</math> Thus, using this simple methodology, the parameters of the Weibull distribution can be estimated.
 
====Determining the ''x'' and ''y'' Position of the Plot Points====
 
The points on the plot represent our data or, more specifically, our times-to-failure data. If, for example, we tested four units that failed at 10, 20, 30 and 40 hours, we would use these times as our ''x'' values or time values.
 
Determining what the appropriate ''y'' plotting positions, or the unreliability values, is a little more complex. To determine the ''y'' plotting positions, we must first determine a value indicating the corresponding unreliability for that failure. In other words, we need to obtain the cumulative percent failed for each time-to-failure. In this example, and by 10 hours, the cumulative percent failed is 25%, by 20 hours 50%, and so forth. This is a simple method illustrating the idea. The problem with this simple method is the fact that the 100% point is not defined on most probability plots, thus an alternative and more robust approach must be used. The most widely used method of determining this value is the method of obtaining the ''median rank'' for each failure.  This is discussed next.
 
====Median Ranks====
Median ranks are used to obtain an estimate of the unreliability, <math>Q({T_j})</math> for each failure. It is the value that the true probability of failure, <math>Q({{T}_{j}}),</math> should have at the <math>{{j}^{th}}</math> failure out of a sample of <math>N</math> units at a ''50%'' confidence level. This essentially means that this is our best estimate for the unreliability. Half of the time the true value will be greater than the 50% confidence estimate, the other half of the time the true value will be less than the estimate. This estimate is based on a solution of the binomial equation.
 
The rank can be found for any percentage point, <math>P</math>, greater than zero and less than one, by solving the cumulative binomial equation for <math>Z</math> . This represents the rank, or unreliability estimate, for the <math>{{j}^{th}}</math> failurein the following equation for the cumulative binomial:
 
::<math>P=\underset{k=j}{\overset{N}{\mathop \sum }}\,\left( \begin{matrix}
  N  \\
  k  \\
\end{matrix} \right){{Z}^{k}}{{\left( 1-Z \right)}^{N-k}}</math>
 
where <math>N</math> is the sample size and <math>j</math> the order number.
 
The median rank is obtained by solving this equation for <math>Z</math> at <math>P=0.50,</math>
 
::<math>0.50=\underset{k=j}{\overset{N}{\mathop \sum }}\,\left( \begin{matrix}
  N  \\
  k  \\
\end{matrix} \right){{Z}^{k}}{{\left( 1-Z \right)}^{N-k}}</math>
 
 
For example, if ''N=4'' and we have four failures, we would solve the median rank equation four times; once for each failure with ''j=1, 2, 3'' and ''4'', for the value of ''Z''. This result can then be used as the unreliability estimate for each failure or the ''y'' plotting position. (See also the The Weibull distribution chapter for a step-by-step example of this method.) The solution of cumuative binomial equation for ''Z'' requires the use of numerical methods.
 
=====Beta and F distributions Approach=====
A more straightforward and easier method of estimating median ranks is by applying two transformations to the cumulative binomial equation, first to the beta distribution and then to the F distribution, resulting in,
::<math>\begin{array}{*{35}{l}}
  MR & = & \tfrac{1}{1+\tfrac{N-j+1}{j}{{F}_{0.50;m;n}}}  \\
  m & = & 2(N-j+1)  \\
  n & = & 2j  \\
\end{array}</math>
:where
::<math>{F_{0.50;m;n}}</math> denotes the ''F'' distribution at the ''0.50'' point, with ''m'' and ''n'' degrees of freedom, for failure ''j'' out of ''N'' units.
 
=====Benard's approximation for median ranks=====
Another quick, and less accurate, approximation of the median ranks is also given by:
::<math>MR = \frac{{j - 0.3}}{{N + 0.4}}</math>
This approximation of the median ranks is also known as ''Benard's approximation''.
 
====Kaplan-Meier====
The Kaplan-Meier estimator (also known as the product limit estimator) is used as an alternative to the median ranks method for calculating the estimates of the unreliability for probability plotting purposes. The equation of the estimator is given by,
::<math>\widehat{F}({{t}_{i}})=1-\underset{j=1}{\overset{i}{\mathop \prod }}\,\frac{{{n}_{j}}-{{r}_{j}}}{{{n}_{j}}},\text{ }i=1,...,m</math>
:where,
<br>
::<math>\begin{align}
  m =  & {\text{total number of data points}} \\
  n =  & {\text{the total number of units}} \\
  {n_i} =  & n - \sum_{j = 0}^{i - 1}{s_j} - \sum_{j = 0}^{i - 1}{r_j}, \text{i = 1,...,m }\\
  {r_j} =  & {\text{ number of failures in the }}{j^{th}}{\text{ data group, and}} \\
  {s_j} =  & {\text{number of surviving units in the }}{j^{th}}{\text{ data group}} \\
\end{align}
</math>
Weibull++ provides the option to select whether the median ranks or the Kaplan-Meier estimator is used for the unreliability estimates for probability plotting and regression. By default, the median ranks are used.
 
====Probability Plots for Other Distributions====
This same methodology can be applied to other distributions which have ''cdf'' equations that can be linearized. Different probability papers exist for each distribution, since different distributions have different ''cdf'' equations. Weibull++ automatically creates these plots for you when choosing a probability plot for a particular distribution. Special scales on these plots allow the parameter estimates to be derived directly from the plots, similar to the way <math>\beta </math> and <math>\eta </math> were obtained from the Weibull probability plot. These are discussed in the individual distribution chapters.
 
====Some Shortfalls of Manual Probability Plotting====
Besides the most obvious drawback to probability plotting, which is the amount of effort required, manual probability plotting is not always consistent in the results. Two people plotting a straight line through a set of points will not always draw this line the same way, and thus will come up with slightly different results. This method was used primarily before the widespread use of computers that could easily perform the calculations for more complicated parameter estimation methods, such as the least squares and maximum likelihood methods.
 
 
 
====Example 1====
<br>
 
Let's assume six identical units are reliability tested at the same application and operation
stress levels. All of these units fail during the test after operating for the following times (in hours),  <math>{{T}_{i}}</math> : 96, 257, 498, 763, 1051 and 1744.
<br>
The steps for determining the parameters of the exponential  <math>pdf</math>  representing the
data, using probability plotting, are as follows:
<br>
::• Rank the times-to-failure in ascending order as shown next.
<br>
 
::<math>\begin{matrix}
  \text{Time-to-} & \text{Failure Order Number}  \\
  \text{failure, hr} & \text{out of a Sample Size of 6}  \\
  \text{96} & \text{1}  \\
  \text{257} & \text{2}  \\
  \text{498} & \text{3}  \\
  \text{763} & \text{4}  \\
  \text{1,051} & \text{5}  \\
  \text{1,744} & \text{6}  \\
\end{matrix}</math>
 
<br>
::• Obtain their median rank plotting positions.
<br>
Median rank  positions are used instead of other ranking methods because median ranks are at a
specific confidence level (50%).
<br>
::• The times-to-failure, with their corresponding median ranks, are shown next:
 
<br>
::<math>\begin{matrix}
  \text{Time-to-} & \text{Median}  \\
  \text{failure, hr} & \text{Rank, }%  \\
  \text{96} & \text{10}\text{.91}  \\
  \text{257} & \text{26}\text{.44}  \\
  \text{498} & \text{42}\text{.14}  \\
  \text{763} & \text{57}\text{.86}  \\
  \text{1,051} & \text{73}\text{.56}  \\
  \text{1,744} & \text{89}\text{.10}  \\
\end{matrix}</math>
 
<br>
::• On an exponential probability paper, plot the times on the x-axis and their corresponding
rank value on the y-axis. Fig. 4 displays an example of an exponential probability paper. The
paper is simply a log-linear paper. (The solution is given in Fig. 2.)
<br>
[[File:ALTA4.1.gif|center]]
<br>
<center>Fig. 4: Sample exponential probability paper.</center>
<br>
::• Draw the best possible straight line that goes through the  <math>t=0</math>  and  <math>
(t)=100%</math>  point and through the plotted points (as shown in Fig. 5).
<br>
::• At the  <math>Q(t)=63.2%</math>  or  <math>R(t)=36.8%</math>  ordinate point, draw a
straight horizontal line until this line intersects the fitted straight line. Draw a vertical line through this intersection until it crosses the abscissa. The value at the intersection of the abscissa is the estimate of the mean. For this case,  <math>\widehat{\mu }=833</math>  hr which means that  <math>\lambda =\tfrac{1}{\mu }=0.0012</math> . (This is always at 63.2% since  <math>(T)=1-{{e}^{-\tfrac{\mu }{\mu }}}=1-{{e}^{-1}}=0.632=63.2%).</math>
<br>
[[File:ALTA4.2.gif|center]]
<br>
<center>Fig. 5: Probability plot for Example 1.</center>
<br>
 
<br>
Now any reliability value for any mission time  <math>t</math>  can be obtained. For example, the
reliability for a mission of 15 hr, or any other time, can now be obtained either from the plot or analytically (i.e. using the equations given in Section  <math>5.1.1</math> ).
 
<br>
To obtain the value from the plot, draw a vertical line from the abscissa, at  <math>t=15</math>
hr, to the fitted line. Draw a horizontal line from this intersection to the ordinate and read
<math>R(t)</math> . In this case,  <math>R(t=15)=98.15%</math> . This can also be obtained
analytically, from the exponential reliability function.
 
<br>
 
====MLE Parameter Estimation====
 
<br>
The parameter of the exponential distribution can also be estimated using the maximum likelihood estimation (MLE) method. This log-likelihood function is:
 
<br>
::<math>\ln (L)=\Lambda =\underset{i=1}{\overset{{{F}_{e}}}{\mathop \sum }}\,{{N}_{i}}\ln \left[ \lambda {{e}^{-\lambda {{T}_{i}}}} \right]-\underset{i=1}{\overset{S}{\mathop \sum }}\,N_{i}^{\prime }\lambda T_{i}^{\prime }+\overset{FI}{\mathop{\underset{i=1}{\mathop{\underset{}{\overset{}{\mathop \sum }}\,}}\,}}\,N_{i}^{\prime \prime }\ln [R_{Li}^{\prime \prime }-R_{Ri}^{\prime \prime }]</math>
<br>
where:
<br>
::<math>R_{Li}^{\prime \prime }={{e}^{-\lambda T_{Li}^{\prime \prime }}}</math>
 
<br>
::<math>R_{Ri}^{\prime \prime }={{e}^{-\lambda T_{Ri}^{\prime \prime }}}</math>
 
<br>
and
<br>
::• <math>{{F}_{e}}</math>  is the number of groups of times-to-failure data points.
::• <math>{{N}_{i}}</math>  is the number of times-to-failure in the  <math>{{i}^{th}}</math>  time-to-failure data group.
::• <math>\lambda </math>  is the failure rate parameter (unknown a priori, the only parameter to be found).
::• <math>{{T}_{i}}</math>  is the time of the  <math>{{i}^{th}}</math>  group of time-to-failure data.
::• <math>S</math>  is the number of groups of suspension data points.
::• <math>N_{i}^{\prime }</math>  is the number of suspensions in the  <math>{{i}^{th}}</math>  group of suspension data points.
::• <math>T_{i}^{\prime }</math>  is the time of the  <math>{{i}^{th}}</math>  suspension data group.
::• <math>FI</math>  is the number of interval data groups.
::• <math>N_{i}^{\prime \prime }</math>  is the number of intervals in the i <math>^{th}</math>  group of data intervals.
::• <math>T_{Li}^{\prime \prime }</math>  is the beginning of the i <math>^{th}</math>  interval.
::• <math>T_{Ri}^{\prime \prime }</math>  is the ending of the i <math>^{th}</math>  interval.
 
<br>
The solution will be found by solving for a parameter  <math>\widehat{\lambda }</math>  so that  <math>\tfrac{\partial \Lambda }{\partial \lambda }=0</math>  where:
<br>
::<math>\frac{\partial \Lambda }{\partial \lambda }=\underset{i=1}{\overset{{{F}_{e}}}{\mathop \sum }}\,{{N}_{i}}\left( \frac{1}{\lambda }-{{T}_{i}} \right)-\underset{i=1}{\overset{S}{\mathop \sum }}\,N_{i}^{\prime }T_{i}^{\prime }-\overset{FI}{\mathop{\underset{i=1}{\mathop{\underset{}{\overset{}{\mathop \sum }}\,}}\,}}\,N_{i}^{\prime \prime }\frac{T_{Li}^{\prime \prime }R_{Li}^{\prime \prime }-T_{Ri}^{\prime \prime }R_{Ri}^{\prime \prime }}{R_{Li}^{\prime \prime }-R_{Ri}^{\prime \prime }}</math>
 
<br>
 
====Example 2====
 
<br>
Using the same data as in the probability plotting example (Example 1), and assuming an exponential distribution, estimate the parameter using the MLE method.
<br>
''Solution''
<br>
In this example we have non-grouped data without suspensions. Thus Eqn. (exp-mle) becomes:
<br>
::<math>\frac{\partial \Lambda }{\partial \lambda }=\underset{i=1}{\overset{{{F}_{e}}}{\mathop \sum }}\,\left[ \frac{1}{\lambda }-\left( {{T}_{i}} \right) \right]=\underset{i=1}{\overset{14}{\mathop \sum }}\,\left[ \frac{1}{\lambda }-\left( {{T}_{i}} \right) \right]=0</math>
 
<br>
Substituting the values for  <math>T</math>  we get:
 
<br>
::<math>\begin{align}
  \frac{6}{\lambda }= & 4409,\text{ or:} \\
  \lambda = & 0.00136\text{ failure/hr} 
\end{align}</math>

Latest revision as of 16:29, 19 February 2014