Warranty Data Analysis

The Weibull++ warranty analysis folio provides four different data entry formats for warranty claims data. It allows the user to automatically perform life data analysis, predict future failures (through the use of conditional probability analysis), and provides a method for detecting outliers. The four data-entry formats for storing sales and returns information are:


 * 1)	Nevada Chart Format
 * 2)	Time-to-Failure Format
 * 3)	Dates of Failure Format
 * 4)	Usage Format

These formats are explained in the next sections.

Nevada Chart Format
The Nevada format allows the user to convert shipping and warranty return data into the standard reliability data form of failures and suspensions so that it can easily be analyzed with traditional life data analysis methods. For each time period in which a number of products are shipped, there will be a certain number of returns or failures in subsequent time periods, while the rest of the population that was shipped will continue to operate in the following time periods. For example, if 500 units are shipped in May, and 10 of those units are warranty returns in June, that is equivalent to 10 failures at a time of one month. The other 450 units will go on to operate and possibly fail in the months that follow. This information can be arranged in a diagonal chart, as shown in the following figure.



At the end of the analysis period, all of the units that were shipped and have not failed in the time since shipment are considered to be suspensions. This process is repeated for each shipment and the results tabulated for each particular failure and suspension time prior to reliability analysis. This process may sound confusing, but it is actually just a matter of careful bookkeeping. The following example illustrates this process.

Example 1: Nevada Chart Format Calculations Example

A company keeps track of its shipments and warranty returns on a month-by-month basis. The following table records the shipments in June, July and August, and the warranty returns through September:

We will examine the data month by month. In June 100 units were sold, and in July 3 of these units were returned. This gives 3 failures at one month for the June shipment, which we will denote as $${{F}_{JUN,1}}=3$$. Likewise, 3 failures occurred in August and 5 occurred in September for this shipment, or $${{F}_{JUN,2}}=3$$  and  $${{F}_{JUN,3}}=5$$. Consequently, at the end of our three-month analysis period, there were a total of 11 failures for the 100 units shipped in June. This means that 89 units are presumably still operating, and can be considered suspensions at three months, or $${{S}_{JUN,3}}=89$$. For the shipment of 140 in July, 2 were returned the following month, or $${{F}_{JUL,1}}=2$$, and 4 more were returned the month after that, or  $${{F}_{JUL,2}}=4$$. After two months, there are 134 ( $$140-2-4=134$$ ) units from the July shipment still operating, or $${{S}_{JUL,2}}=134$$. For the final shipment of 150 in August, 4 fail in September, or $${{F}_{AUG,1}}=4$$, with the remaining 146 units being suspensions at one month, or  $${{S}_{AUG,1}}=146$$.

It is now a simple matter to add up the number of failures for 1, 2, and 3 months, then add the suspensions to get our reliability data set:

$$\begin{matrix} \text{Failures at 1 month:} & {{F}_{1}}={{F}_{JUN,1}}+{{F}_{JUL,1}}+{{F}_{AUG,1}}=3+2+4=9 \\ \text{Suspensions at 1 month:} & {{S}_{1}}={{S}_{AUG,1}}=146 \\ \text{Failures at 2 months:} & {{F}_{2}}={{F}_{JUN,2}}+{{F}_{JUL,2}}=3+4=7 \\ \text{Suspensions at 2 months:} & {{S}_{2}}={{S}_{JUL,2}}=134 \\ \text{Failures at 3 months:} & {{F}_{3}}={{F}_{JUN,3}}=5 \\ \text{Suspensions at 3 months:} & {{S}_{JUN,3}}=89 \\ \end{matrix}$$

These calculations are automatically performed in Weibull++. View the Weibull++ solution in HTML or Video

Time-to-Failure Format
This format is similar to the standard folio data entry format (all number of units, failure times and suspension times are entered by the user). The difference is that when the data is used within the context of warranty analysis, the ability to generate forecasts is available to the user.

Example 2:

Dates of Failure Format
Another common way for reporting field information is to enter a date and quantity of sales or shipments (Quantity In-Service data) and the date and quantity of returns (Quantity Returned data). In order to identify which lot the unit comes from, a failure is identified by a return date and the date of when it was put in service. The date that the unit went into service is then associated with the lot going into service during that time period. You can use the optional Subset ID column in the data sheet to record any information to identify the lots.

Example 3:

Usage Format
Often, the driving factor for reliability is usage rather than time. For example, in the automotive industry, the failure behavior in the majority of the products is mileage-dependent rather than time-dependent. The usage format allows the user to convert shipping and warranty return data into the standard reliability data for of failures and suspensions when the return information is based on usage rather than return dates or periods. Similar to the dates of failure format, a failure is identified by the return number and the date of when it was put in service in order to identify which lot the unit comes from. The date that the returned unit went into service associates the returned unit with the lot it belonged to when it started operation. However, the return data is in terms of usage and not date of return. Therefore the usage of the units needs to be specified as a constant usage per unit time or as a distribution. This allows for determining the expected usage of the surviving units.

Suppose that you have been collecting sales (units in service) and returns data. For the returns data, you can determine the number of failures and their usage by reading the odometer value, for example. Determining the number of surviving units (suspensions) and their ages is a straightforward step. By taking the difference between the analysis date and the date when a unit was put in service, you can determine the age of the surviving units.

What is unknown, however, is the exact usage accumulated by each surviving unit. The key part of the usage-based warranty analysis is the determination of the usage of the surviving units based on their age. Therefore, the analyst needs to have an idea about the usage of the product. This can be obtained, for example, from customer surveys or by designing the products to collect usage data. For example, in automotive applications, engineers often use 12,000 miles/year as the average usage. Based on this average, the usage of an item that has been in the field for 6 months and has not yet failed would be 6,000 miles. So to obtain the usage of a suspension based on an average usage, one could take the time of each suspension and multiply it by this average usage. In this situation, the analysis becomes straightforward. With the usage values and the quantities of the returned units, a failure distribution can be constructed and subsequent warranty analysis becomes possible.

Alternatively, and more realistically, instead of using an average usage, an actual distribution that reflects the variation in usage and customer behavior can be used. This distribution describes the usage of a unit over a certain time period (e.g., 1 year, 1 month, etc). This probabilistic model can be used to estimate the usage for all surviving components in service and the percentage of users running the product at different usage rates. In the automotive example, for instance, such a distribution can be used to calculate the percentage of customers that drive 0-200 miles/month, 200-400 miles/month, etc. We can take these percentages and multiply them by the number of suspensions to find the number of items that have been accumulating usage values in these ranges.

To proceed with applying a usage distribution, the usage distribution is divided into increments based on a specified interval width denoted as $$Z$$. The usage distribution, $$Q$$, is divided into intervals of  $$0+Z$$ ,  $$Z+Z$$ ,  $$2Z+Z$$ , etc., or  $${{x}_{i}}={{x}_{i-1}}+Z$$ , as shown in the next figure.



The interval width should be selected such that it creates segments that are large enough to contain adequate numbers of suspensions within the intervals. The percentage of suspensions that belong to each usage interval is calculated as follows:


 * $$F({{x}_{i}})=Q({{x}_{i}})-Q({{x}_{i}}-1)$$

where:


 * $$Q$$ is the usage distribution Cumulative Density Function,  $$cdf$$.


 * $$x$$ represents the intervals used in apportioning the suspended population.

A suspension group is a collection of suspensions that have the same age. The percentage of suspensions can be translated to numbers of suspensions within each interval, $${{x}_{i}}$$. This is done by taking each group of suspensions and multiplying it by each $$F({{x}_{i}})$$, or:


 * $$\begin{align}

& {{N}_{1,j}}= & F({{x}_{1}})\times N{{S}_{j}} \\ & {{N}_{2,j}}= & F({{x}_{2}})\times N{{S}_{j}} \\ & & ... \\  & {{N}_{n,j}}= & F({{x}_{n}})\times N{{S}_{j}} \end{align}$$

where:


 * $${{N}_{n,j}}$$ is the number of suspensions that belong to each interval.


 * $$N{{S}_{j}}$$ is the jth group of suspensions from the data set.

This is repeated for all the groups of suspensions.

The age of the suspensions is calculated by subtracting the Date In-Service ( $$DIS$$ ), which is the date at which the unit started operation, from the end of observation period date or End Date ( $$ED$$ ). This is the Time In-Service ( $$TIS$$ ) value that describes the age of the surviving unit.


 * $$TIS=ED-DIS$$

Note: $$TIS$$  is in the same time units as the period in which the usage distribution is defined.

For each $${{N}_{k,j}}$$, the usage is calculated as:


 * $$Uk,j=xi\times TISj$$

After this step, the usage of each suspension group is estimated. This data can be combined with the failures data set, and a failure distribution can be fitted to the combined data.

Example 4:

To illustrate the calculations behind the results of this example, consider the 9 units that went in service on December 2009. 1 unit failed from that group; therefore, 8 suspensions have survived from December 2009 until the beginning of December 2010, a total of 12 months. The calculations are summarized as follows.



The two columns on the right constitute the calculated suspension data (number of suspensions and their usage) for the group. The calculation is then repeated for each of the remaining groups in the data set. These data are then combined with the data about the failures to form the life data set that is used to estimate the failure distribution model.

Warranty Prediction
Once a life data analysis has been performed on warranty data, this information can be used to predict how many warranty returns there will be in subsequent time periods. This methodology uses the concept of conditional reliability (see Basic Statistical Background) to calculate the probability of failure for the remaining units for each shipment time period. This conditional probability of failure is then multiplied by the number of units at risk from that particular shipment period that are still in the field (i.e. the suspensions) in order to predict the number of failures or warranty returns expected for this time period. The next example illustrates this.

Example 5: Warranty Prediction Calculations

Using the data ain the following table, predict the number of warranty returns for October for each of the three shipment periods. Use the following Weibull parameters, beta = 2.4928 and eta = 6.6951.

Solution

Use the Weibull parameter estimates to determine the conditional probability of failure for each shipment time period, and then multiply that probability with the number of units that are at risk for that period as follows. The equation for the conditional probability of failure is given by:


 * $$Q(t|T)=1-R(t|T)=1-\frac{R(T+t)}{R(T)}$$

For the June shipment, there are 89 units that have successfully operated until the end of September ( $$T=3$$  $$months)$$. The probability of one of these units failing in the next month ( $$t=1$$  $$month)$$  is then given by:


 * $$Q(1|3)=1-\frac{R(4)}{R(3)}=1-\frac=1-\frac{0.7582}{0.8735}=0.132$$

Once the probability of failure for an additional month of operation is determined, the expected number of failed units during the next month, from the June shipment, is the product of this probability and the number of units at risk ( $${{S}_{JUN,3}}=89)$$ or:


 * $${{\widehat{F}}_{JUN,4}}=89\cdot 0.132=11.748\text{, or 12 units}$$

This is then repeated for the July shipment, where there were 134 units operating at the end of September, with an exposure time of two months. The probability of failure in the next month is:


 * $$Q(1|2)=1-\frac{R(3)}{R(2)}=1-\frac{0.8735}{0.9519}=0.0824$$

This value is multiplied by $${{S}_{JUL,2}}=134$$  to determine the number of failures, or:


 * $${{\widehat{F}}_{JUL,3}}=134\cdot 0.0824=11.035\text{, or 11 units}$$

For the August shipment, there were 146 units operating at the end of September, with an exposure time of one month. The probability of failure in the next month is:


 * $$Q(1|1)=1-\frac{R(2)}{R(1)}=1-\frac{0.9519}{0.9913}=0.0397$$

This value is multiplied by $${{S}_{AUG,1}}=146$$  to determine the number of failures, or:


 * $${{\widehat{F}}_{AUG,2}}=146\cdot 0.0397=5.796\text{, or 6 units}$$

Thus, the total expected returns from all shipments for the next month is the sum of the above, or 29 units. This method can be easily repeated for different future sales periods, and utilizing projected shipments. If the user lists the number of units that are expected be sold or shipped during future periods, then these units are added to the number of units at risk whenever they are introduced into the field. The Generate Forecast functionality in the Weibull++ warranty analysis folio can automate this process for you.

Analysis of Non-Homogeneous Warranty Data
In the previous sections and examples, it is important to note that the underlying assumption was that the population was homogeneous. In other words, all sold and returned units were exactly the same (i.e., the same population with no design changes and/or modifications). In many situations, as the product matures, design changes are made to enhance and/or improve the reliability of the product. Obviously, an improved product will exhibit different failure characteristics than its predecessor. To analyze such cases, where the population is non-homogeneous, one needs to extract each homogenous group, fit a life model to each group and then project the expected returns for each group based on the number of units at risk for each specific group.

Using Subset IDs in Weibull++

Weibull++ includes an optional Subset ID column that allows to differentiate between product versions or different designs (lots). Based on the entries, the software will separately analyze (i.e., obtain parameters and failure projections for) each subset of data. Note that it is important to realize that the same limitations with regards to the number of failures that are needed are also applicable here. In other words, distributions can be automatically fitted to lots that have return (failure) data, whereas if no returns have been experienced yet (either because the units are going to be introduced in the future or because no failures happened yet), the user will be asked to specify the parameters, since they can not be computed. Consequently, subsequent estimation/predictions related to these lots would be based on the user specified parameters. The following example illustrates the use of Subset IDs.

Example 6:

Monitoring Warranty Returns Using Statistical Process Control (SPC)
By monitoring and analyzing warranty return data, one can detect specific return periods and/or batches of sales or shipments that may deviate (differ) from the assumed model. This provides the analyst (and the organization) the advantage of early notification of possible deviations in manufacturing, use conditions and/or any other factor that may adversely affect the reliability of the fielded product. Obviously, the motivation for performing such analysis is to allow for faster intervention to avoid increased costs due to increased warranty returns or more serious repercussions. Additionally, this analysis can also be used to uncover different sub-populations that may exist within the population.

Analysis Method
For each sales period  $$i$$  and return period  $$j$$, the prediction error can be calculated as follows:


 * $${{e}_{i,j}}={{\hat{F}}_{i,j}}-{{F}_{i,j}}$$

where $${{\hat{F}}_{i,j}}$$  is the estimated number of failures based on the estimated distribution parameters for the sales period  $$i$$  and the return period  $$j$$, which is calculated using the equation for the conditional probability, and  $${{F}_{i,j}}$$  is the actual number of failure for the sales period  $$i$$  and the return period  $$j$$.

Since we are assuming that the model is accurate, $${{e}_{i,j}}$$  should follow a normal distribution with mean value of zero and a standard deviation  $$s$$, where:


 * $${{\bar{e}}_{i,j}}=\frac{\underset{i}{\mathop{\sum }}\,\underset{j}{\mathop{\sum }}\,{{e}_{i,j}}}{n}=0$$

and $$n$$  is the total number of return data (total number of residuals).

The estimated standard deviation of the prediction errors can then be calculated by:


 * $$s=\sqrt{\frac{1}{n-1}\underset{i}{\mathop \sum }\,\underset{j}{\mathop \sum }\,e_{i,j}^{2}}$$

and $${{e}_{i,j}}$$  can be normalized as follows:


 * $${{z}_{i,j}}=\frac{s}$$

where $${{z}_{i,j}}$$  is the standardized error. $${{z}_{i,j}}$$ follows a normal distribution with  $$\mu =0$$  and  $$\sigma =1$$.

It is known that the square of a random variable with standard normal distribution follows the $${{\chi }^{2}}$$  (Chi Square) distribution with 1 degree of freedom and that the sum of the squares of  $$m$$  random variables with standard normal distribution follows the  $${{\chi }^{2}}$$  distribution with  $$m$$  degrees of freedom $$.$$  This then can be used to help detect the abnormal returns for a given sales period, return period or just a specific cell (combination of a return and a sales period).


 * For a cell, abnormality is detected if $$z_{i,j}^{2}=\chi _{1}^{2}\ge \chi _{1,\alpha }^{2}.$$
 * For an entire sales period $$i$$, abnormality is detected if  $$\underset{j}{\mathop{\sum }}\,z_{i,j}^{2}=\chi _{J}^{2}\ge \chi _{\alpha ,J}^{2},$$  where  $$J$$  is the total number of return period for a sales period  $$i$$.
 * For an entire return period $$j$$, abnormality is detected if  $$\underset{i}{\mathop{\sum }}\,z_{i,j}^{2}=\chi _{I}^{2}\ge \chi _{\alpha ,I}^{2},$$  where  $$I$$  is the total number of sales period for a return period  $$j$$.

Here $$\alpha $$  is the criticality value of the  $${{\chi }^{2}}$$  distribution, which can be set at critical value or caution value. It describes the level of sensitivity to outliers (returns that deviate significantly from the predictions based on the fitted model). Increasing the value of  $$\alpha $$  increases the power of detection, but this could lead to more false alarms.

Example 7: Statistical Process Control

Using the data from the following table, the expected returns for each sales period can be obtained using conditional reliability concepts, as given in the conditional probability equation.

For example, for the month of September, the expected return number is given by:


 * $${{\hat{F}}_{Jun,3}}=(100-6)\cdot \left( 1-\frac{R(3)}{R(2)} \right)=94\cdot 0.08239=7.7447$$

The actual number of returns during this period is five; thus, the prediction error for this period is:


 * $${{e}_{Jun,3}}={{\hat{F}}_{Jun,3}}-{{F}_{Jun,3}}=7.7447-5=2.7447.$$

This can then be repeated for each cell, yielding the following table for $${{e}_{i,j}}$$ :

$$\begin{matrix} {} & {} & RETURNS & {} & {} \\ {} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} \\   \text{Jun}\text{. 2005} & \text{100} & \text{-2}\text{.1297} & \text{0}\text{.8462} & \text{2}\text{.7447} \\ \text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{-0}\text{.7816} & \text{1}\text{.4719} \\ \text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{-2}\text{.6946} \\ \end{matrix}$$

Now, for this example, $$n=6$$,  $${{\bar{e}}_{i,j}}=-0.5432$$  and  $$s=1.6890.$$

Thus the $$z_{i,j}$$ values are:

$$\begin{matrix} {} & {} & RETURNS & {} & {} \\ {} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} \\   \text{Jun}\text{. 2005} & \text{100} & \text{-0}\text{.9968} & \text{0}\text{.3960} & \text{1}\text{.2846} \\ \text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{-0}\text{.3658} & \text{0}\text{.6889} \\ \text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{-1}\text{.2612} \\ \end{matrix}$$

The $$z_{i,j}^{2}$$ values, for each cell, are given in the following table.

$$\begin{matrix} {} & {} & RETURNS & {} & {} & {} \\ {} & SHIP & \text{Jul}\text{. 2005} & \text{Aug}\text{. 2005} & \text{Sep}\text{. 2005} & \text{Sum} \\ \text{Jun}\text{. 2005} & \text{100} & \text{0}\text{.9936} & \text{0}\text{.1569} & \text{1}\text{.6505} & 2.8010 \\ \text{Jul}\text{. 2005} & \text{140} & \text{-} & \text{0}\text{.1338} & \text{0}\text{.4747} & 0.6085 \\ \text{Aug}\text{. 2005} & \text{150} & \text{-} & \text{-} & \text{1}\text{.5905} & 1.5905 \\ \text{Sum} & {} & 0.9936 & 0.2907 & 3.7157 & {} \\ \end{matrix}$$

If the critical value is set at $$\alpha =$$  0.01 and the caution value is set at  $$\alpha =$$  0.1, then the critical and caution  $${{\chi }^{2}}$$  values will be:

$$\begin{matrix} {} & & Degree of Freedom \\ {} & \text{1} & \text{2} & \text{3} \\ {{\chi}^{2}\text{Critical}} & \text{6.6349} & \text{9.2103} & \text{11.3449}  \\ {{\chi}^{2}\text{Caution}} & \text{2,7055} & \text{4.6052} & \text{6.2514} \\ \end{matrix}$$

If we consider the sales periods as the basis for outlier detection, then after comparing the above table to the sum of $$z_{i,j}^{2}$$   $$(\chi _{1}^{2})$$  values for each sales period, we find that all the sales values do not exceed the critical and caution limits. For example, the total $${{\chi }^{2}}$$  value of the sale month of July is 0.6085. Its degrees of freedom is 2, so the corresponding caution and critical values are 4.6052 and 9.2103 respectively. Both values are larger than 0.6085, so the return numbers of the July sales period do not deviate (based on the chosen significance) from the model's predictions.

If we consider returns periods as the basis for outliers detection, then after comparing the above table to the sum of $$z_{i,j}^{2}$$   $$(\chi _{1}^{2})$$   values for each return period, we find that all the return values do not exceed the critical and caution limits. For example, the total $${{\chi }^{2}}$$  value of the sale month of August is 3.7157. Its degree of freedom is 3, so the corresponding caution and critical values are 6.2514 and 11.3449 respectively. Both values are larger than 3.7157, so the return numbers for the June return period do not deviate from the model's predictions.

This analysis can be automatically performed in Weibull++ by entering the alpha values in the Statistical Process Control page of the control panel and selecting which period to color code, as shown next.



To view the table of chi-squared values ( $$z_{i,j}^{2}$$ or  $$\chi _{1}^{2}$$  values), click the Show Results (...) button.



Weibull++ automatically color codes SPC results for easy visualization in the returns data sheet. By default, the green color means that the return number is normal; the yellow color indicates that the return number is larger than the caution threshold but smaller than the critical value; the red color means that the return is abnormal, meaning that the return number is either too big or too small compared to the predicted value.

In this example, all the cells are coded in green for both analyses (i.e., by sales periods or by return periods), indicating that all returns fall within the caution and critical limits (i.e., nothing abnormal). Another way to visualize this is by using a Chi-Squared plot for the sales period and return period, as shown next.





Using Subset IDs with Statistical Process Control
The warranty monitoring methodology explained in this section can also be used to detect different subpopulations in a data set. The different subpopulations can reflect different use conditions, different material, etc. In this methodology, one can use different subset IDs to differentiate between subpopulations, and obtain models that are distinct to each subpopulation. The following example illustrates this concept.

Example 8: