Non-Parametric Recurrent Event Data Analysis

This article appears in the Life Data Analysis Reference book.

Non-parametric RDA provides a non-parametric graphical estimate of the mean cumulative number or cost of recurrence per unit versus age. As discussed in Nelson [31], in the reliability field, the Mean Cumulative Function (MCF) can be used to:


 * Evaluate whether the population repair (or cost) rate increases or decreases with age (this is useful for product retirement and burn-in decisions).
 * Estimate the average number or cost of repairs per unit during warranty or some time period.
 * Compare two or more sets of data from different designs, production periods, maintenance policies, environments, operating conditions, etc.
 * Predict future numbers and costs of repairs, such as the expected number of failures next month, quarter, or year.
 * Reveal unexpected information and insight.

The Mean Cumulative Function (MCF)
In a non-parametric analysis of recurrent event data, each population unit can be described by a cumulative history function for the cumulative number of recurrences. It is a staircase function that depicts the cumulative number of recurrences of a particular event, such as repairs over time. The figure below depicts a unit's cumulative history function.



The non-parametric model for a population of units is described as the population of cumulative history functions (curves). It is the population of all staircase functions of every unit in the population. At age t, the units have a distribution of their cumulative number of events. That is, a fraction of the population has accumulated 0 recurrences, another fraction has accumulated 1 recurrence, another fraction has accumulated 2 recurrences, etc. This distribution differs at different ages $$t\,\!$$, and has a mean $$M(t)\,\!$$ called the mean cumulative function (MCF). The $$M(t)\,\!$$ is the point-wise average of all population cumulative history functions (see figure below).



For the case of uncensored data, the mean cumulative function $$M{{(t)}_{i}}\ \,\!$$ values at different recurrence ages $${{t}_{i}}\,\!$$ are estimated by calculating the average of the cumulative number of recurrences of events for each unit in the population at $${{t}_{i}}\,\!$$. When the histories are censored, the following steps are applied.

1st Step - Order all ages:

Order all recurrence and censoring ages from smallest to largest. If a recurrence age for a unit is the same as its censoring (suspension) age, then the recurrence age goes first. If multiple units have a common recurrence or censoring age, then these units could be put in a certain order or be sorted randomly.

2nd Step - Calculate the number, $${{r}_{i}}\,\!$$, of units that passed through age $${{t}_{i}}\,\!$$ :


 * $$\begin{align}

& {{r}_{i}}= & {{r}_{i-1}}\quad \quad \text{if }{{t}_{i}}\text{ is a recurrence age} \\ & {{r}_{i}}= & {{r}_{i-1}}-1\text{  if }{{t}_{i}}\text{ is a censoring age} \end{align}\,\!$$

$$N\,\!$$ is the total number of units and $${{r}_{1}} = N\,\!$$ at the first observed age which could be a recurrence or suspension.

3rd Step - Calculate the MCF estimate, M*(t):

For each sample recurrence age $${{t}_{i}}\,\!$$, calculate the mean cumulative function estimate as follows


 * $${{M}^{*}}({{t}_{i}})=\frac{1}+{{M}^{*}}({{t}_{i-1}})\,\!$$

where $${{M}^{*}}(t)=\tfrac{1}\,\!$$ at the earliest observed recurrence age, $${{t}_{1}}\,\!$$.

Confidence Limits for the MCF
Upper and lower confidence limits for $$M({{t}_{i}})\,\!$$ are:


 * $$\begin{align}

& {{M}_{U}}({{t}_{i}})= {{M}^{*}}({{t}_{i}}).{{e}^{\tfrac{{{K}_{\alpha }}.\sqrt{Var[{{M}^{*}}({{t}_{i}})]}}{{{M}^{*}}({{t}_{i}})}}} \\ & {{M}_{L}}({{t}_{i}})= \frac{{{M}^{*}}({{t}_{i}})} \end{align}\,\!$$

where $$\alpha \,\!$$ ( $$50%<\alpha <100%\,\!$$ ) is confidence level, $${{K}_{\alpha }}\,\!$$ is the $$\alpha \,\!$$ standard normal percentile and $$Var[{{M}^{*}}({{t}_{i}})]\,\!$$ is the variance of the MCF estimate at recurrence age $${{t}_{i}}\,\!$$. The variance is calculated as follows:


 * $$Var[{{M}^{*}}({{t}_{i}})]=Var[{{M}^{*}}({{t}_{i-1}})]+\frac{1}{r_{i}^{2}}\left[ \underset{j\in {{R}_{i}}}{\overset{}{\mathop \sum }}\,{{\left( {{d}_{ji}}-\frac{1}{{{r}_{i}}} \right)}^{2}} \right]\,\!$$

where $${r}_{i}\,\!$$ is defined in the equation of the survivals, $${{R}_{i}}\,\!$$ is the set of the units that have not been suspended by $$i\,\!$$ and $${{d}_{ji}}\,\!$$ is defined as follows:


 * $$\begin{align}

& {{d}_{ji}}= 1\text{ if the }{{j}^{\text{th }}}\text{unit had an event recurrence at age }{{t}_{i}} \\ & {{d}_{ji}}= 0\text{  if the }{{j}^{\text{th }}}\text{unit did not have an event reoccur at age }{{t}_{i}} \end{align}\,\!$$

In case there are multiple events at the same time $${{t}_{i}}\,\!$$, $${{d}_{ji}}\,\!$$ is calculated sequentially for each event. For each event, only one $${{d}_{ji}}\,\!$$ can take value of 1. Once all the events at $${{t}_{i}}\,\!$$ are calculated, the final calculated MCF and its variance are the values for time  $${{t}_{i}}\,\!$$. This is illustrated in the following example.