Fleet Data Analysis

Fleet analysis is similar to the repairable systems analysis described in the previous chapter. The main difference is that a fleet of systems is considered and the models are applied to the fleet failures rather than to the system failures. In other words, repairable system analysis models the number of system failures versus system time; whereas fleet analysis models the number of fleet failures versus fleet time.

The main motivation for fleet analysis is to enable the application of the Crow Extended model for fielded data. In many cases, reliability improvements might be necessary on systems that are already in the field. These types of reliability improvements are essentially delayed fixes (BD modes) as described in the Crow Extended chapter.

=Introduction= Recall from the previous chapter that in order to make projections using the Crow Extended model, the $$\beta $$  of the combined A and BD modes should be equal to 1. Since the failure intensity in a fielded system might be changing over time (e.g. increasing if the system wears out), this assumption might be violated. In such a scenario, the Crow Extended model cannot be used. However, if a fleet of systems is considered and the number of fleet failures versus fleet time is modeled, the failures might become random. This is because there is a mixture of systems within a fleet, new and old, and when the failures of this mixture of systems are viewed from a cumulative fleet time point of view, they may be random. The next two figures illustrate this concept. The first picture shows the number of failures over system age. It can be clearly seen that as the systems age, the intensity of the failures increases (wearout). The superposition system line, which brings the failures from the different systems under a single timeline, also illustrates this observation. On the other hand, if you take the same four systems and combine their failures from a fleet perspective, and consider fleet failures over cumulative fleet hours, then the failures seem to be random. The second picture illustrates this concept in the System Operation plot when you consider the Cum. Time Line. In this case, the $$\beta $$  of the fleet will be equal to 1 and the Crow Extended model can be used for quantifying the effects of future reliability improvements on the fleet.



=Methodology= The figures above illustrate that the difference between repairable system data analysis and fleet analysis is the way that the data set is treated. In fleet analysis, the time-to-failure data from each system is stacked to a cumulative timeline. For example, consider the two systems in the following table.

Convert to Accumulated Timeline
The data set is first converted to an accumulated timeline, as follows:
 * System 1 is considered first. The accumulated timeline is therefore 3 and 7 hours.
 * System 1's End Time is 10 hours. System 2's first failure is at 4 hours. This failure time is added to System 1's End Time to give an accumulated failure time of 14 hours.
 * The second failure for System 2 occurred 5 hours after the first failure. This time interval is added to the accumulated timeline to give 19 hours.
 * The third failure for System 2 occurred 4 hours after the second failure. The accumulated failure time is 19 + 4 = 23 hours.
 * System 2's end time is 15 hours, or 2 hours after the last failure. The total accumulated operating time for the fleet is 25 hours (23 + 2 = 25).

In general, the accumulated operating time $${{Y}_{j}}$$  is calculated by:


 * $${{Y}_{j}}={{X}_{i,q}}+\underset{q=1}{\overset{K-1}{\mathop \sum }}\,{{T}_{q}},\text{ }m=1,2,...,N$$


 * where:


 * $${{X}_{i,q}}$$ is the  $${{i}^{th}}$$  failure of the  $${{q}^{th}}$$  system
 * $${{T}_{q}}$$ is the end time of the  $${{q}^{th}}$$  system
 * $$K$$ is the total number of systems
 * $$N$$ is the total number of failures from all systems ( $$N=\underset{j=1}{\overset{K}{\mathop{\sum }}}\,{{N}_{q}}$$ )

As this example demonstrates, the accumulated timeline is determined based on the order of the systems. So if you consider the data in Table 13.2 by taking System 2 first, the accumulated timeline would be: 4, 9, 13, 18, 22, with an end time of 25. Therefore, the order in which the systems are considered is somewhat important. However, in the next step of the analysis the data from the accumulated timeline will be grouped into time intervals, effectively eliminating the importance of the order of the systems. Keep in mind that this will NOT always be true. This is true only when the order of the systems was random to begin with. If there is some logic/pattern in the order of the systems, then it will remain even if the cumulative timeline is converted to grouped data. For example, consider a system that wears out with age. This means that more failures will be observed as this system ages and these failures will occur more frequently. Within a fleet of such systems, there will be new and old systems in operation. If the dataset collected is considered from the newest to the oldest system, then even if the data points are grouped, the pattern of fewer failures at the beginning and more failures at later time intervals will still be present. If the objective of the analysis is to determine the difference between newer and older systems, then that order for the data will be acceptable. However, if the objective of the analysis is to determine the reliability of the fleet, then the systems should be randomly ordered.

Analyze the Grouped Data
Once the accumulated timeline has been generated, it is then converted into grouped data. To accomplish this, a group interval is required. The group interval length should be chosen so that it is representative of the data. Also note that the intervals do not have to be of equal length. Once the data points have been grouped, the parameters can be obtained using maximum likelihood estimation as described in the Grouped Data Analysis section of the Crow-AMSAA (NHPP) chapter. The data from the table above can be grouped into 5 hour intervals. This interval length is sufficiently large to insure that there are failures within each interval. The grouped data set is given in the following table.

The Crow-AMSAA model for Grouped Failure Times is used for the data in Table 13.3 and the parameters of the model are solved by satisfying the following maximum likelihood equations (See Crow-AMSAA (NHPP)).


 * $$\begin{matrix}

\widehat{\lambda }=\frac{n}{T_{k}^{\widehat{\beta }}} \\ \underset{i=1}{\overset{k}{\mathop \sum }}\,{{n}_{i}}\left[ \frac{T_{i}^{\widehat{\beta }}\ln {{T}_{i-1}}-T_{i-1}^{\widehat{\beta }}\ln {{T}_{i-1}}}{T_{i}^{\widehat{\beta }}-T_{i-1}^{\widehat{\beta }}}-\ln {{T}_{k}} \right]=0 \\ \end{matrix}$$

Example
The following table presents data for a fleet of 27 systems. A cycle is a complete history from overhaul to overhaul. The failure history for the last completed cycle for each system is recorded. This is a random sample of data from the fleet. These systems are in the order in which they were selected. Suppose the intervals to group the current data are 10000, 20000, 30000, 40000 and the final interval is defined by the termination time. Conduct the fleet analysis.

Solution

The sample fleet data set can be grouped into 10000, 20000, 30000, 4000 and 52110 time intervals. The following table gives the grouped data.

Based on the above time intervals, the maximum likelihood estimates of $$\widehat{\lambda }$$  and  $$\widehat{\beta }$$  for this data set are then given by:


 * $$\begin{matrix}

\widehat{\lambda }=0.00147 \\ \widehat{\beta }=0.93328 \\ \end{matrix}$$

The next figure shows the System Operation plot.



=Applying the Crow Extended Model to Fleet Data= As it was mentioned previously, the main motivation of the fleet analysis is to apply the Crow Extended model for in-service reliability improvements. The methodology to be used is identical to the application of the Crow Extended model for Grouped Data described in a previous chapter. Consider the fleet data from the example above. In order to apply the Crow Extended model, put $$N=37$$  failure times on a cumulative time scale over  $$(0,T)$$, where  $$T=52110$$. In the example, each $${{T}_{i}}$$  corresponds to a failure time  $${{X}_{ij}}$$. This is often not the situation. However, in all cases the accumulated operating time $${{Y}_{q}}$$  at a failure time  $${{X}_{ir}}$$  is:


 * $$\begin{align}

& {{Y}_{q}}= & {{X}_{i,r}}+\underset{j=1}{\overset{r-1}{\mathop \sum }}\,{{T}_{j}},\ \ \ q=1,2,\ldots ,N \\ & N= & \underset{j=1}{\overset{K}{\mathop \sum }}\,{{N}_{j}} \end{align}$$

And $$q$$  indexes the successive order of the failures. Thus, in this example $$N=37,\,{{Y}_{1}}=1396,\,{{Y}_{2}}=5893,\,{{Y}_{3}}=6418,\ldots ,{{Y}_{37}}=52110$$. See the table below.

Each system failure time in Table 13.4 corresponds to a problem and a cause (failure mode). The management strategy can be to not fix the failure mode (A mode) or to fix the failure mode with a delayed corrective action (BD mode). There are $${{N}_{A}}=4$$  failures due to A failure modes. There are $${{N}_{BD}}=33$$  total failures due to  $$M=13$$  distinct BD failure modes. Some of the distinct BD modes had repeats of the same problem. For example, mode BD1 had 12 occurrences of the same problem. Therefore, in this example, there are 13 distinct corrective actions corresponding to 13 distinct BD failure modes. The objective of the Crow Extended model is to estimate the impact of the 13 distinct corrective actions.The analyst will choose an average effectiveness factor (EF) based on the proposed corrective actions and historical experience. Historical industry and government data supports a typical average effectiveness factor $$\overline{d}=.70$$  for many systems. In this example, an average EF of $$\bar{d}=0.4$$ was assumed in order to be conservative regarding the impact of the proposed corrective actions. Since there are no BC failure modes (corrective actions applied during the test), the projected failure intensity is:


 * $$\widehat{r}(T)=\left( \frac{T}+\underset{i=1}{\overset{M}{\mathop \sum }}\,(1-{{d}_{i}})\frac{T} \right)+\overline{d}h(T)$$

The first term is estimated by:


 * $${{\widehat{\lambda }}_{A}}=\frac{T}=0.000077$$

The second term is:


 * $$\underset{i=1}{\overset{M}{\mathop \sum }}\,(1-{{d}_{i}})\frac{T}=0.00038$$

This estimates the growth potential failure intensity:


 * $$\begin{align}

& {{\widehat{\gamma }}_{GP}}(T)= & \frac{T}+\underset{i=1}{\overset{M}{\mathop \sum }}\,(1-{{d}_{i}})\frac{T} \\ & = & 0.00046 \end{align}$$

To estimate the last term $$\overline{d}h(T)$$  of the Crow Extended model, partition the data in Table 13.6 into intervals. This partition consists of $$D$$  successive intervals. The length of the $${{q}^{th}}$$  interval is  $${{L}_{q}},$$   $$\,q=1,2,\ldots ,D$$. It is not required that the intervals be of the same length, but there should be several (e.g. at least 5) cycles per interval on average. Also, let $${{S}_{1}}={{L}_{1}},$$   $${{S}_{2}}={{L}_{1}}+{{L}_{2}},\ldots ,$$  etc. be the accumulated time through the  $${{q}^{th}}$$  interval. For the $${{q}^{th}}$$  interval note the number of distinct BD modes,  $$M{{I}_{q}}$$, appearing for the first time,  $$q=1,2,\ldots ,D$$. See Table 13.7.

The term $$\widehat{h}(T)$$  is calculated as  $$\widehat{h}(T)=\widehat{\lambda }\widehat{\beta }{{T}^{\widehat{\beta }-1}}$$ and the values  $$\widehat{\lambda }$$  and  $$\widehat{\beta }$$  satisfy Eqns. (cc1) and (cc2). This is the grouped data version of the Crow-AMSAA model applied only to the first occurrence of distinct BD modes.

For the data in Table 13.6 the first 4 intervals had a length of 10000 and the last interval was 12110. Therefore, $$D=5$$. This choice gives an average of about 5 overhaul cycles per interval. See Table 13.8.


 * Thus:


 * $$\begin{align}

& \widehat{\lambda }= & 0.00330 \\ & \widehat{\beta }= & 0.76219 \end{align}$$


 * This gives:


 * $$\begin{align}

& \widehat{h}(T)= & \widehat{\lambda }\widehat{\beta }{{T}^{\widehat{\beta }-1}} \\ & = & 0.00019 \end{align}$$

Consequently, for $$\overline{d}=0.4$$  the last term of the Crow Extended model is given by:


 * $$\overline{d}h(T)=0.000076$$

The projected failure intensity is:


 * $$\begin{align}

& \widehat{r}(T)= & \frac{T}+\underset{i=1}{\overset{M}{\mathop \sum }}\,(1-{{d}_{i}})\frac{T}+\overline{d}h(T) \\ & = & 0.000077+0.6\times (0.00063)+0.4\times (0.00019) \\ & = & 0.000533 \end{align}$$

This estimates that the 13 proposed corrective actions will reduce the number of failures per cycle of operation hours from the current $$\widehat{r}(0)=\tfrac{{{N}_{A}}+{{N}_{BD}}}{T}=0.00071$$  to  $$\widehat{r}(T)=0.00053.$$  The average time between failures is estimated to increase from the current 1408.38 hours to 1876.93 hours.

=Confidence Bounds= For fleet data analysis using the Crow-AMSAA model, the confidence bounds are calculated using the same procedure as described in the Crow-AMSAA (NHPP) chapter. For fleet data analysis using the Crow Extended model, the confidence bounds are calculated using the same procedure as described in the Crow Extended chapter.

=Examples=

Predicting the Number of Failures for Fleet Operation
11 systems from the field were chosen for the purposes of a fleet analysis. Each system had at least one failure. All of the systems had a start time equal to zero and the last failure for each system corresponds to the end time. Group the data based on a fixed interval of 3,000 hours and assume a fixed effectiveness factor equal to 0.4. Do the following:

1)	Estimate the parameters of the Crow Extended model.

2)	Based on the analysis does it appear that the systems were randomly ordered?

3)	After the implementation of the delayed fixes, how many failures would you expect within the next 4,000 hours of fleet operation.

Solution
 * 1)	The next figure shows the estimated Crow Extended parameters.




 * 2)	Upon observing the estimated parameter $$\beta $$  it does appear that the systems were randomly ordered since  $$\beta =0.8569$$ . This value is close to 1. You can also verify that the confidence bounds on  $$\beta $$  include 1 by going to the QCP and calculating the parameter bounds or by viewing the Beta Bounds plot. However, you can also determine graphically if the systems were randomly ordered by using the System Operation plot as shown in the System Operation plot below. Looking at the Cum. Time Line, it does not appear that the failures have a trend associated with them. Therefore, the systems can be assumed to be randomly ordered.