Event Log Data

From ReliaWiki
Revision as of 00:00, 24 August 2012 by Kate Racaza (talk | contribs)
Jump to navigation Jump to search

Template:LDABOOK SUB Event logs, or maintenance logs, store information about a piece of equipment's failures and repairs. They provide useful information that can help companies achieve their productivity goals by giving insight about the failure modes, frequency of outages, repair duration, uptime/downtime and availability of the equipment. Some event logs contain more information than others, but essentially event logs capture data in a format that includes the type of event, the date/time when the event occurred and the date/time when the system was restored to operation.

The data from event logs can be used to extract failure times and repair times information. For n number of failures and repair actions that took place during the event logging period, the times-to-failure of every unique occurrence of an event are obtained by calculating the time between the last repair and the time the new failure occurred, or:

[math]\displaystyle{ \text{Time-to-Failure}_{i}=t_{1}-r_{i-1}\,\! }[/math]
where:
  • [math]\displaystyle{ i=1,...n\,\! }[/math]
  • [math]\displaystyle{ t_{i}\,\! }[/math] is the date/time of occurrence of [math]\displaystyle{ i\,\! }[/math].
  • [math]\displaystyle{ r_{i-1}\,\! }[/math] is the date/time of restoration of the previous occurrence [math]\displaystyle{ (i-1)\,\! }[/math].


For systems that were new when the collection of the event log data started, the times to first occurrence of every unique event is equivalent to the date/time of the occurrence of the event minus the time the system monitoring started. That is:

[math]\displaystyle{ \text{Time-to-Failure}_{1}=t_{1}-\text{System Start Time}\,\! }[/math]


For systems that were not new when the collection of event log data started, the times to first occurrence of every unique event are considered to be suspensions (right censored) because the system is assumed to have accumulated more hours before the data collection period started (i.e., the time between the start date/time and the first occurrence of an event is not the entire operating time). In this case:

[math]\displaystyle{ \text{Suspension}_{1}=t_{1}-\text{System Start Time}\,\! }[/math]


When monitoring on the system is stopped or when the system is no longer being used, all events that have not occurred by this time are considered to be suspensions.

[math]\displaystyle{ \text{Last Suspension}=\text{System End Time}-r_{n}\,\! }[/math]


The four equation given above are valid for cases in which the component operates through the failure of other components. When the component does not operate through the failures, the assumptions must include the downtime of the system due to the other failures. In other words, the first four equations become:

[math]\displaystyle{ \text{Time-to-Failure}_{i}=t_{1}-r_{i-1}-(\text{System Downtime since}\,r_{i-1})\,\! }[/math]
[math]\displaystyle{ \text{Time-to-failure}_{i}=t_{1}-(\text{System Start Time}-\text{System Downtime since System Start Time})\,\! }[/math]
[math]\displaystyle{ \text{Suspension}_{1}=t_{1}-(\text{System Start Time}-\text{System Downtime since System Start Time})\,\! }[/math]
[math]\displaystyle{ \text{LastSuspension} = \text{System End Time}-r_{n}-\text{System Downtime since}\,r_{n}\,\! }[/math]


Repair times are obtained by calculating the difference between the date/time of event occurrence and the date/time of restoration, or:

[math]\displaystyle{ \text{Time-to-repair}_{i}=r_{i}-t_{i}\,\! }[/math]


All these equations should also take into consideration the periods when the system is not operating or not in use, as in the case of operations that do not run on a 24/7 basis. The failure/repair data of every component in the event log can then be used to derive failure distributions and repair distributions using life data analysis methods. The process of data extraction and model fitting can be automated using the Weibull++ event log folio.


Example

Consider a very simple system composed of only two components, A and B. The system runs from 8 AM to 5 PM, Monday through Friday. When a failure is observed, the system undergoes repair and the failed component is replaced. The date and time of each failure is recorded in an equipment downtime log, along with an indication of the component that caused the failure. The date and time when the system was restored is also recorded. The downtime log for this simple system is given next.

Note that:

  • The date and time of each failure is recorded.
  • The date and time of repair completion for each failure is recorded.
  • The repair involves replacement of the responsible component.
  • The responsible component for each failure is recorded.

For this example, we will assume that an engineer began recording these events on January 1, 1997 at 12 PM and stopped recording on March 18, 1997 at 1 PM, at which time the analysis was performed. Information for events prior to January 1 is unknown.

The objective of the analysis is to obtain the failure and repair distributions for each component. To do this, the times-to-failure and the times-to-repair for each component need to be computed from the data in the table. Once the times-to-failure data and times-to-repair data have been obtained, a life distribution will be fitted to each data set. The principles and theory for fitting a life distribution is presented in detail in Life Distributions.


Solution


Obtaining Failure and Repair Times for Component A


We begin the analysis by looking at component A. The first time that component A is known to have failed is recorded in row 1 of the data sheet; thus, the first age (or time-to-failure) for A is the difference between the time we began recording the data and the time when this failure event happened. Also, the component does not age when the system is down due to the failure of another component. Therefore, this time must be taken into account.


1. The First Time-To-Failure for Component A, TTFA[1]

The first time-to-failure of component A, TTFA[1], is the sum of the hours of operation for each day, starting on the start date (and time) and ending with the failure date (and time). This is shown graphically next. The operating periods are indicated with a green background.