Gap Analysis

This article also appears in the Reliability Growth and Repairable System Analysis Reference book.

Most of the reliability growth models used for estimating and tracking reliability growth based on test data assume that the data set represents all actual system failure times consistent with a uniform definition of failure (complete data). In practice, this may not always be the case and may result in too few or too many failures being reported over some interval of test time. This may result in distorted estimates of the growth rate and current system reliability. This section discusses a practical reliability growth estimation and analysis procedure based on the assumption that anomalies may exist within the data over some interval of the test period but the remaining failure data follows the Crow-AMSAA reliability growth model. In particular, it is assumed that the beginning and ending points in which the anomalies lie are generated independently of the underlying reliability growth process. The approach for estimating the parameters of the growth model with problem data over some interval of time is basically to not use this failure information. The analysis retains the contribution of the interval to the total test time, but no assumptions are made regarding the actual number of failures over the interval. This is often referred to as gap analysis.

Consider the case where a system is tested for time $$T\,\!$$ and the actual failure times are recorded. The time $$T\,\!$$ may possibly be an observed failure time. Also, the end points of the gap interval may or may not correspond to a recorded failure time. The underlying assumption is that the data used in the maximum likelihood estimation follows the Crow-AMSAA model with a Weibull intensity function $$\lambda \beta {{t}^{\beta -1}}\,\!$$. It is not assumed that zero failures occurred during the gap interval, rather, it is assumed that the actual number of failures is unknown, and hence no information at all regarding these failure is used to estimate $$\lambda \,\!$$ and $$\beta \,\!$$.

Let $${{S}_{1}}\,\!$$, $${{S}_{2}}\,\!$$ denote the end points of the gap interval, $${{S}_{1}}<{{S}_{2}}.\,\!$$ Let $$0<{{X}_{1}}<{{X}_{2}}<\ldots <{{X}_}\le {{S}_{1}}\,\!$$ be the failure times over $$(0,\,{{S}_{1}})\,\!$$ and let $${{S}_{2}}<{{Y}_{1}}<{{Y}_{2}}<\ldots <{{Y}_}\le T\,\!$$ be the failure times over $$({{S}_{2}},\,T)\,\!$$. The maximum likelihood estimates of $$\lambda \,\!$$ and $$\beta \,\!$$ are values $$\widehat{\lambda }\,\!$$ and $$\widehat{\beta }\,\!$$ satisfying the following equations.


 * $$\widehat{\lambda }=\frac{{{N}_{1}}+{{N}_{2}}}{S\widehat{_{1}^{\beta }}+{{T}^{\widehat{\beta }}}-S_{2}^{\widehat{\beta }}}\,\!$$


 * $$\widehat{\beta }=\frac{{{N}_{1}}+{{N}_{2}}}{\widehat{\lambda }\left[ S\widehat{_{1}^{\beta }}\ln {{S}_{1}}+{{T}^{\widehat{\beta }}}\ln T-S_{2}^{\widehat{\beta }}\ln {{S}_{2}} \right]-\left[ \underset{i=1}{\overset{\mathop{\sum }}}\,\ln {{X}_{i}}+\underset{i=1}{\overset{\mathop{\sum }}}\,\ln {{Y}_{i}} \right]}\,\!$$

In general, these equations cannot be solved explicitly for $$\widehat{\lambda }\,\!$$ and $$\widehat{\beta }\,\!$$, but must be solved by an iterative procedure.

Missing Data Example
Consider a system under development that was subjected to a reliability growth test for $$T=1,000\,\!$$ hours. Each month, the successive failure times, on a cumulative test time basis, were reported. According to the test plan, 125 hours of test time were accumulated on each prototype system each month. The total reliability growth test program lasted for 7 months. One prototype was tested for each of the months 1, 3, 4, 5, 6 and 7 with 125 hours of test time. During the second month, two prototypes were tested for a total of 250 hours of test time. The next table shows the successive $$N=86\,\!$$ failure times that were reported for $$T=1000\,\!$$ hours of testing.

$${{X}_{i}},\,\!$$ $$i=1,2,\ldots ,86\,\!$$, $$N = 86, T = 1000\,\!$$ The observed and cumulative number of failures for each month are:


 * 1)	Determine the maximum likelihood estimators for the Crow-AMSAA model.
 * 2)	Evaluate the goodness-of-fit for the model.
 * 3)	Consider $$(500,\ 625)\,\!$$ as the gap interval and determine the maximum likelihood estimates of $$\lambda \,\!$$ and $$\beta \,\!$$.

Solution


 * 1)	For the time terminated test:


 * $$\begin{align}

& \widehat{\beta }= & 0.7597 \\ & \widehat{\lambda }= & 0.4521 \end{align}\,\!$$
 * 2)	The Cramér-von Mises goodness-of-fit test for this data set yields:


 * $$C_{M}^{2}=\tfrac{1}{12M}+\underset{i=1}{\overset{M}{\mathop{\sum }}}\,{{\left[ (\tfrac{{{T}_{i}}}{T})\widehat{^{\beta }}-\tfrac{2i-1}{2M} \right]}^{2}}= 0.6989\,\!$$


 * The critical value at the 10% significance level is 0.173. Therefore, the test indicated that the analyst should reject the hypothesis that the data set follows the Crow-AMSAA reliability growth model. The following plot shows $$\ln N(t)\,\!$$ versus $$\ln t\,\!$$ with the fitted line $$\ln \hat{\lambda }+\hat{\beta }\ln t\,\!$$, where $$\widehat{\lambda }=0.4521\,\!$$ and $$\widehat{\beta }=0.7597\,\!$$ are the maximum likelihood estimates.




 * Observing the data during the fourth month (between 500 and 625 hours), 38 failures were reported. This number is very high in comparison to the failures reported in the other months. A quick investigation found that a number of new data collectors were assigned to the project during this month. It was also discovered that extensive design changes were made during this period, which involved the removal of a large number of parts. It is possible that these removals, which were not failures, were incorrectly reported as failed parts. Based on knowledge of the system and the test program, it was clear that such a large number of actual system failures was extremely unlikely. The consensus was that this anomaly was due to the failure reporting. For this analysis, it was decided that the actual number of failures over this month is assumed to be unknown, but consistent with the remaining data and the Crow-AMSAA reliability growth model.


 * 3)	Considering the problem interval $$(500,625)\,\!$$ as the gap interval, we will use the data over the interval $$(0,500)\,\!$$ and over the interval $$(625,1000).\,\!$$ The equations for analyzing missing data are the appropriate equations to estimate $$\lambda \,\!$$ and $$\beta \,\!$$ because the failure times are known. In this case $${{S}_{1}}=500,\,{{S}_{2}}=625\,\!$$ and $$T=1000,\ {{N}_{1}}=35,\,{{N}_{2}}=13\,\!$$. The maximum likelihood estimates of $$\lambda \,\!$$ and $$\beta \,\!$$ are:


 * $$\begin{align}

& \widehat{\beta }= & 0.5596 \\ & \widehat{\lambda }= & 1.1052 \end{align}\,\!$$


 * The next figure is a plot of the cumulative number of failures versus time. This plot is approximately linear, which also indicates a good fit of the model.