Event Log Data: Difference between revisions

Revision as of 01:43, 24 August 2012

Template:LDABOOK SUB Event logs, or maintenance logs, store information about a piece of equipment's failures and repairs. They provide useful information that can help companies achieve their productivity goals by giving insight about the failure modes, frequency of outages, repair duration, uptime/downtime and availability of the equipment. Some event logs contain more information than others, but essentially event logs capture data in a format that includes the type of event, the date/time when the event occurred and the date/time when the system was restored to operation.

The data from event logs can be used to extract failure times and repair times information. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution can be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in Life Distributions. The process of data extraction and model fitting can be automated using the Weibull++ event log folio.

Converting Event Logs to Failure/Repair Data

For n number of failures and repair actions that took place during the event logging period, the times-to-failure of every unique occurrence of an event are obtained by calculating the time between the last repair and the time the new failure occurred, or:

[math]\displaystyle{ \text{Time-to-Failure}_{i}=t_{1}-r_{i-1}\,\! }[/math]

where:

[math]\displaystyle{ i=1,...n\,\! }[/math]
[math]\displaystyle{ t_{i}\,\! }[/math] is the date/time of occurrence of [math]\displaystyle{ i\,\! }[/math].
[math]\displaystyle{ r_{i-1}\,\! }[/math] is the date/time of restoration of the previous occurrence [math]\displaystyle{ (i-1)\,\! }[/math].

For systems that were new when the collection of the event log data started, the times to first occurrence of every unique event is equivalent to the date/time of the occurrence of the event minus the time the system monitoring started. That is:

[math]\displaystyle{ \text{Time-to-Failure}_{1}=t_{1}-\text{System Start Time}\,\! }[/math]

For systems that were not new when the collection of event log data started, the times to first occurrence of every unique event are considered to be suspensions (right censored) because the system is assumed to have accumulated more hours before the data collection period started (i.e., the time between the start date/time and the first occurrence of an event is not the entire operating time). In this case:

[math]\displaystyle{ \text{Suspension}_{1}=t_{1}-\text{System Start Time}\,\! }[/math]

When monitoring on the system is stopped or when the system is no longer being used, all events that have not occurred by this time are considered to be suspensions.

[math]\displaystyle{ \text{Last Suspension}=\text{System End Time}-r_{n}\,\! }[/math]

The four equation given above are valid for cases in which the component operates through the failure of other components. When the component does not operate through the failures, the assumptions must include the downtime of the system due to the other failures. In other words, the first four equations become:

[math]\displaystyle{ \text{Time-to-Failure}_{i}=t_{1}-r_{i-1}-(\text{System Downtime since}\,r_{i-1})\,\! }[/math]

[math]\displaystyle{ \text{Time-to-failure}_{i}=t_{1}-(\text{System Start Time}-\text{System Downtime since System Start Time})\,\! }[/math]

[math]\displaystyle{ \text{Suspension}_{1}=t_{1}-(\text{System Start Time}-\text{System Downtime since System Start Time})\,\! }[/math]

[math]\displaystyle{ \text{LastSuspension} = \text{System End Time}-r_{n}-\text{System Downtime since}\,r_{n}\,\! }[/math]

Repair times are obtained by calculating the difference between the date/time of event occurrence and the date/time of restoration, or:

[math]\displaystyle{ \text{Time-to-repair}_{i}=r_{i}-t_{i}\,\! }[/math]

All these equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis.

Example

Consider a very simple system composed of only two components, A and B. The system runs from 8 AM to 5 PM, Monday through Friday. When a failure is observed, the system undergoes repair and the failed component is replaced. The date and time of each failure is recorded in an equipment downtime log, along with an indication of the component that caused the failure. The date and time when the system was restored is also recorded. The downtime log for this simple system is given next.

Note that:

The date and time of each failure is recorded.
The date and time of repair completion for each failure is recorded.
The repair involves replacement of the responsible component.
The responsible component for each failure is recorded.

For this example, we will assume that an engineer began recording these events on January 1, 1997 at 12 PM and stopped recording on March 18, 1997 at 1 PM, at which time the analysis was performed. Information for events prior to January 1 is unknown. The objective of the analysis is to obtain the failure and repair distributions for each component.

Solution

We begin the analysis by looking at component A. The first time that component A is known to have failed is recorded in row 1 of the data sheet; thus, the first age (or time-to-failure) for A is the difference between the time we began recording the data and the time when this failure event happened. Also, the component does not age when the system is down due to the failure of another component. Therefore, this time must be taken into account.

1. The First Time-To-Failure for Component A, TTF_A[1]

The first time-to-failure of component A, TTF_A[1], is the sum of the hours of operation for each day, starting on the start date (and time) and ending with the failure date (and time). This is graphically shown next. The boxes with the green background indicate the operating periods. Thus, TTF_A[1] = 5 + 8 = 13 hours.

2. The First Time-To-Repair for Component A, TTR_A[1]

The time-to-repair for component A for this failure, TTR_A[1], is [Date/Time Restored - Date/Time Occurred] or:

TTR_A[1] = (Jan 02 1997/7:49 PM) - (Jan 02 1997/4:00 PM) = 3:49 = 3.8166 hours

(Note that in the case of repair actions, shifts are not taken into account since it is assumed that repair actions will be performed as needed to bring the system up.)

3. The Second Time-To-Failure for Component A, TTF_A[2]

Continuing with component A, the second system failure due to component A is found in row 4, on January 12, 1997 at 3:26 PM. Thus, to compute TTF_A[2], you must look at the age the component accumulated from the last repair time, taking shifts into account as before, but with the added complexity of accounting for the times that the system was down due to failures of other components (i.e., component A was not aging when the system was down for repair due to a component B failure).

This is shown graphically next using green to show the operating times of A and orange to show the downtimes of the system for reasons other than the failure of A (to the closest hour).

To illustrate this mathematically, we will use a function, [math]\displaystyle{ \tau }[/math], which, given a range of times, returns the shift hours worked during that period. In other words, for this example [math]\displaystyle{ \tau }[/math](1/1/97 3:00 AM - 1/1/97 6:00 PM) = 9 hours given an 8 AM to 5 PM shift. Furthermore, we will show the date and time a failure occurred as DTO and the date and time a repair was completed at DTR with a numerical subscript indicating the row that this entry is in (e.g., DTO₄ for the date and time a failure occurred in row 4).

Then the total possible hours (TPH) that component A could have operated from the time it was repaired to the time it failed the second time is:

TPH = [math]\displaystyle{ \tau }[/math](DTO₄ – DTR₁),

TPH = [math]\displaystyle{ \tau }[/math](DTO₄ – DTR₁) = 9 Days * 9 hours + 7:26 hours = 88:26 hours = 88.433 hours

The time that component A was not operating (NOP) during normal hours of operation is the time that the system was down due to failure of component B, or:

NOP = [math]\displaystyle{ \tau }[/math](DTO₂ – DTR₂) + [math]\displaystyle{ \tau }[/math](DTO₃ – DTR₃)

NOP = [math]\displaystyle{ \tau }[/math](DTO₂ – DTR₂) + [math]\displaystyle{ \tau }[/math]( DTO₃ – DTR₃) = 2:13 hours + 7:47 hours = 10:00 hours

Thus, the second time-to-failure for component A, TTF_A[2], is:

TTF_A [2] = TPH- NOP

TTF_A[2] = 88:26 hours –10:00 hours = 78:26 hours = 78.433 hours

4. The Second Time-To-Repair for Component A, TTR_A[2]

To compute the time-to-repair for this failure:

TTR_A[2] = [math]\displaystyle{ \tau }[/math] (DTO₄ – DTR₄) = (3 h, 49 m) = 3.8166 hours

5. Computing the Rest of the Observed Failures

This same process can be repeated for the rest of the observed failures, yielding:

TTF_A[3] = 8.9333

TTF_A[4] = 56.25

TTF_A[5] = 33.05

TTF_A[6] = 100.8433

TTF_A[7] = 35.7

TTF_A[8] = 112.3166

TTF_A[9] = 23.1

TTF_A[10] = 13.9666

TTF_A[11] = 90.5166

and

TTR_A[3] = 0.4166

TTR_A[4] = 29.6166

TTR_A[5] = 0.4833

TTR_A[6] = 4.5166

TTR_A[7] = 17.2833

TTR_A[8] = 0.4833

TTR_A[9] = 0.45

TTR_A[10] = 5.5

TTR_A[11] = 0.4666

6. Creating the Data Sets

When the above computations are complete, we can create the data set needed to obtain the life distributions for the failure and repair times for component A. To accomplish this, modifications will need to be performed on the TTF data, given the original assumptions, as follows:

TTF_A[2] through TTF_A[11] will remain as is and be designated as times-to-failure (F).
TTF_A[1] will be designated as a right censored data point (or suspension, S). This is because when we started collecting data, component A was operating for an unknown period of time X, so the true time-to-failure for component A is the operating time observed (in this case, TTF_A[1] = 13 hours) plus the unknown operating time X. Thus, what we know is that the true time-to-failure for A is some time greater than the observed TTF_A[1] (i.e., a right censored data point).
An additional right censored observation (suspension) will be added to the data set to reflect the time that component A operated without failure from its last repair time to the end of the observation period. This is presented next.

Since our analysis time ends on March 18, 1997 at 1:00 PM and component A has operated successfully from the last time it was replaced on March 13, 1997 at 5:13 PM, the additional time of successful operation is:

TPH = [math]\displaystyle{ \tau }[/math](End Time – DTR₁₉) = (4 days * 9 hours/day + 5:00 hours) = 41:00 hours

NOP = [math]\displaystyle{ \tau }[/math]( DTO₂₀ – DTR₂₀) = 7:24 hours

Thus, the remaining time that component A operated without failure is:

TTS = TPH - NOP = 33:36 = 33.6 hours.

The next two tables show component A's failure and repair data. The entire analysis can be repeated to obtain the failure and repair times for component B.

Weibull++8 Event Log Folio

The analysis can be automatically performed in the Weibull++ 8 event log folio. Simply enter the data from the equipment downtime log into the folio, as shown next.

Use the Shift Pattern window (Event Log > Action and Settings > Set Shift Pattern) to specify the 8:00 AM to 5:00 PM shifts that occur seven days a week, as shown next.

The utility will automatically convert the equipment downtime log data to time-to-failure and time-to-repair data and fit failure and repair distributions to the data set. To view the failure and repair results, click the Show Analysis Summary (...) button on the control panel. The Results window shows the calculated values.

More event log folio examples are available! See also:

Factory Equipment Failure Log or Watch the video...

@@ Line 2: / Line 2: @@
 Event logs, or maintenance logs, store information about a piece of equipment's failures and repairs. They provide useful information that can help companies achieve their productivity goals by giving insight about the failure modes, frequency of outages, repair duration, uptime/downtime and availability of the equipment. Some event logs contain more information than others, but essentially event logs capture data in a format that includes the type of event, the date/time when the event occurred and the date/time when the system was restored to operation.
-The data from event logs can be used to extract failure times and repair times information. For ''n'' number of failures and repair actions that took place during the event logging period, the times-to-failure of every unique occurrence of an event are obtained by calculating the time between the last repair and the time the new failure occurred, or:
+The data from event logs can be used to extract failure times and repair times information. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution can be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in [[Life Distributions]]. The process of data extraction and model fitting can be automated using the '''Weibull++''' event log folio.
+==Converting Event Logs to Failure/Repair Data==
+For ''n'' number of failures and repair actions that took place during the event logging period, the times-to-failure of every unique occurrence of an event are obtained by calculating the time between the last repair and the time the new failure occurred, or:
@@ Line 49: / Line 53: @@
-All these equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution can be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in [[Life Distributions]]. The process of data extraction and model fitting can be automated using the '''Weibull++''' event log folio.
+All these equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis.
@@ Line 182: / Line 186: @@
 :TTS = TPH - NOP = 33:36 = 33.6 hours.
+The next two tables show component A's failure and repair data. The entire analysis can be repeated to obtain the failure and repair times for component B.
+[[image:Event Log 4.png|center|250px]]
+'''Weibull++8 Event Log Folio'''
+The analysis can be automatically performed in the '''Weibull++ 8''' event log folio. Simply enter the data from the equipment downtime log into the folio, as shown next.
+[[image:Event Log 5.png|center|700px]]
+Use the Shift Pattern window ('''Event Log > Action and Settings > Set Shift Pattern''') to specify the 8:00 AM to 5:00 PM shifts that occur seven days a week, as shown next.
+[[image:Event Log 6.png|center|700px]]
+The utility will automatically convert the equipment downtime log data to time-to-failure and time-to-repair data and fit failure and repair distributions to the data set. To view the failure and repair results, click the '''Show Analysis Summary''' ('''...''') button on the control panel. The Results window shows the calculated values.
+[[image:Event Log 7.png|center|700px]]
+{{Examples Box|Weibull++_Examples|<p>More event log folio examples  are available! See also:</p>
+{{Examples Both|http://www.reliasoft.com/Weibull/examples/rc6/index.htm|Factory Equipment Failure Log|http://www.reliasoft.tv/weibull/appexamples/weibull_app_ex_6.html|Watch the video...}}<nowiki/>
+}}

Event Log Data: Difference between revisions

Revision as of 01:43, 24 August 2012

Converting Event Logs to Failure/Repair Data

Example

Navigation menu

Search