Grouped Data Parameter Estimation

This article appears in the Life data analysis reference.

The grouped data type in Weibull++ is used for tests where there are groups of units having the same time-to-failure, or units are grouped together in intervals, or there are groups of units suspended at the same time. However, you must be cautious in using the different parameter estimation methods because different methods treat grouped data in different ways. ReliaSoft designed Weibull++ to treat grouped data in different ways to maximize the options available to you.

When Using Rank Regression (Least Squares)
When using grouped data, Weibull++ plots the data point corresponding to the highest rank position in each group. For example, given 3 groups of 10 units, each failing at 100, 200 and 300 hours respectively, the three plotted points will be the end point of each group, or the 10th rank position out of 30, the 20th rank position out of 30 and the 30th rank position out of 30. This procedure is identical to standard procedures for using grouped data, as discussed in Kececioglu [19]. In cases where the grouped data are interval censored, it is assumed that the failures occurred at some time in the interval between the previous and current time to failure. In our example, this would be the same as saying that 10 units have failed in the interval between zero and 100 hours, another 10 units failed in the interval between 100 and 200 hours, and 10 more units failed in the interval from 200 to 300 hours. The rank regression analysis automatically takes this into account. If this assumption of interval failure is incorrect (i.e., 10 units failed at exactly 100 hours, 10 failed at exactly 200 hours and 10 failed at exactly 300 hours), then it is recommended that you enter the data as non-grouped when using rank regression, or select the Ungroup on Regression check box on the Analysis page of the folio's control panel.

The Mathematics
Median ranks are used to obtain an estimate of the unreliability, $$Q({{T}_{j}}),\,\!$$ for each failure at a $$50%\,\!$$ confidence level. In the case of grouped data, the ranks are estimated for each group of failures, instead of each failure. For example, consider a group of 10 failures at 100 hours, 10 at 200 hours and 10 at 300 hours. Weibull++ estimates the median ranks ($$Z\,\!$$ values) by solving the cumulative binomial equation with the appropriate values for order number and total number of test units. For 10 failures at 100 hours, the median rank, $$Z,\,\!$$ is estimated by using:


 * $$0.50=\underset{k=j}{\overset{N}{\mathop \sum }}\,\left( \begin{matrix}

N \\ k \\ \end{matrix} \right){{Z}^{k}}{{\left( 1-Z \right)}^{N-k}}\,\!$$

with:


 * $$\begin{align}

N=30,\text{ }J=10 \end{align}\,\!$$

One $$Z\,\!$$ is obtained for the group, to represent the probability of 10 failures occurring out of 30.

For 10 failures at 200 hours, $$Z\,\!$$ is estimated by using:


 * $$0.50=\underset{k=j}{\overset{N}{\mathop \sum }}\,\left( \begin{matrix}

N \\ k \\ \end{matrix} \right){{Z}^{k}}{{\left( 1-Z \right)}^{N-k}}\,\!$$

where:


 * $$\begin{align}

N=30,\text{ }J=20 \end{align}\,\!$$

This represents the probability of 20 failures out of 30.

For 10 failures at 300 hours, $$Z\,\!$$ is estimated by using:


 * $$0.50=\underset{k=j}{\overset{N}{\mathop \sum }}\,\left( \begin{matrix}

N \\ k \\ \end{matrix} \right){{Z}^{k}}{{\left( 1-Z \right)}^{N-k}}\,\!$$

where:


 * $$\begin{align}

N=30,\text{ }J=30 \end{align}\,\!$$

This represents the probability of 30 failures out of 30.

When Using Maximum Likelihood
When using maximum likelihood methods, each individual time is explicitly used in the calculation of the parameters. Theoretically, there is no difference in the entry of a group of 10 units failing at 100 hours and 10 individual entries of 100 hours. This is inherent in the standard MLE method. In other words, no matter how the data were entered (i.e., as grouped or non-grouped) the results will be identical. However, due to the precision issues during the computation, the grouped and ungrouped data may give slightly different results. When using maximum likelihood, we highly recommend entering redundant data in groups, as this significantly speeds up the calculations.