Data & Data Types

Statistical models rely extensively on data to make predictions. In our case, the models are the statistical distributions and the data are the  life dataor  times-to-failure data of our product. The accuracy of any prediction is directly proportional to the quality, accuracy and completeness of the supplied data. Good data, along with the appropriate model choice, usually results in good predictions. Bad, or insufficient data, will almost always result in bad predictions. In the analysis of life data, we want to use all available data which sometimes is incomplete or includes uncertainty as to when a failure occurred. To accomplish this, we separate life data into two categories: complete (all information is available) or censored (some of the information is missing). This chapter details these data classification methods.

Data Classification
Most types of non-life data, as well as some life data, are what we term as complete data. Complete data means that the value of each sample unit is observed or known. In many cases, life data contains uncertainty as to when exactly an event happened (i.e.when the unit failed). Data containing such uncertainty as to exactly when the event happened is termed as censored data.

Complete Data
Complete data means that the value of each sample unit is observed or known. For example, if we had to compute the average test score for a sample of ten students, complete data would consist of the known score for each student. Likewise in the case of life data analysis, our data set (if complete) would be composed of the times-to-failure of all units in our sample. For example, if we tested five units and they all failed (and their times-to-failure were recorded), we would then have complete information as to the time of each failure in the sample.

Complete data is much easier to work with than censored data. For example, it would be much harder to compute the average test score of the students if our data set were not complete, i.e.the average test score given scores of 30, 80, 60, 90, 95, three scores greater than 50, a score that is less than 70 and a score that is between 60 and 80.

Censored Data
In many cases when life data are analyzed, all of the units in the sample may not have failed (i.e. the event of interest was not observed) or the exact times-to-failure of all the units are not known. This type of data is commonly called censored data. There are three types of possible censoring schemes, right censored (also called suspended data), interval censored and left censored.

Right Censored (Suspended)
The most common case of censoring is what is referred to as right censored data, or suspended data. In the case of life data, these data sets are composed of units that did not fail. For example, if we tested five units and only three had failed by the end of the test, we would have suspended data (or right censored data) for the two unfailed units. The term right censored implies that the event of interest (i.e. the time-to-failure) is to the right of our data point. In other words, if the units were to keep on operating, the failure would occur at some time after our data point (or to the right on the time scale).

Interval Censored
The second type of censoring is commonly called interval censored data. Interval censored data reflects uncertainty as to the exact times the units failed within an interval. This type of data frequently comes from tests or situations where the objects of interest are not constantly monitored. If we are running a test on five units and inspecting them every 100 hours, we only know that a unit failed or did not fail between inspections. More specifically, if we inspect a certain unit at 100 hours and find it is operating and then perform another inspection at 200 hours to find that the unit is no longer operating, we know that a failure occurred in the interval between 100 and 200 hours. In other words, the only information we have is that it failed in a certain interval of time. This is also called inspection data by some authors.

Left Censored
The third type of censoring is similar to the interval censoring and is called left censored data. In left censored data, a failure time is only known to be before a certain time. For instance, we may know that a certain unit failed sometime before 100 hours but not exactly when. In other words, it could have failed any time between 0 and 100 hours. This is identical to interval censored datain which the starting time for the interval is zero.

Data Types and Weibull++
Weibull++ allows you to use all of the above data types in a single data set. In other words, a data set can contain complete data, right censored data, interval censored data and left censored data. An overview of this is presented in this section.

Grouped Data and Weibull++
All of the previously mentioned data types can also be put into groups. This is simply a way of collecting units with identical failure or censoring times. If ten units were put on test with the first four units failing at 10, 20, 30 and 40 hours respectively, and then the test were terminated after the fourth failure, you can group the last six units as a group of six suspensions at 40 hours. Weibull++ allows you to enter all types of data as groups, as shown in the following figure.

Depending on the analysis method chosen, i.e. regression or maximum likelihood, Weibull++ treats grouped data differently. This was done by design to allow for more options and flexibility. Appendix B describes how Weibull++ treats grouped data.

Classifying Data in Weibull++
In Weibull++, data classifications are specified using data types. A single data set can contain any or all of the mentioned censoring schemes. Weibull++, through the use of the New Project Wizard and the New Data Sheet Wizard features, simplifies the choice of the appropriate data type for your data. Weibull++ uses the logic tree shown in Fig. 4-1 in deciding which is the appropriate data type for your data.

Analysis & Parameter Estimation Methods for Censored Data
In Chapter 3 we discussed parameter estimation methods for complete data. We will expand on that approach in this section by including estimation methods for the different types of censoring. The basic methods are still based on the same principles covered in Chapter 3, but modified to take into account the fact that some of the data points are censored. For example, assume that you were asked to find the mean (average) of 10, 20, a value that is between 25 and 40, a value that is greater than 30 and a value that is less than 50. In this case, the familiar method of determining the average is no longer applicable and special methods will need to be employed to handle the censored data in this data set.