Reliability Phase Diagrams (RPDs)

The term phase diagram is used in many disciplines with different meanings. In physical chemistry, mineralogy and materials science, a phase diagram is a type of graph used to show the equilibrium conditions among the thermodynamically-distinct phases. In mathematics and physics, a phase diagram is used as a synonym for a phase space. In reliability engineering we introduce the term phase diagram, or more specifically Reliability Phase Diagram or RPD as an extension of the reliability block diagram (RBD) approach to graphically describe the sequence of different operational and/or maintenance phases experienced by a system. Whereas a reliability block diagram (RBD) is used to analyze the reliability of a system with a fixed configuration, a phase diagram can be used to represent/analyze a system whose reliability configuration and/or other properties change over time. In other words, during a mission the system may undergo changes in its reliability configuration (RBD), available resources or the failure, maintenance and/or throughput properties of its individual components. Examples of this include:


 * Systems whose components exhibit different failure distributions depending on changes in the stress on the system.
 * Systems or processes requiring different equipment to function over a cycle, such as start-up, normal production, shut-down, scheduled maintenance, etc.
 * Systems whose RBD configuration changes at different times, such as the RBD of the engine configuration on a four-engine aircraft during taxi, take-off, cruising, and landing.
 * Systems with different types of machinery operating during day and night shifts and with different amounts of throughput during each shift.

To analyze such systems, each stage during the mission can be represented by a phase whose properties are inherited from an RBD corresponding to that phase's reliability configuration, along with any associated resources of the system during that time. A phase diagram is then a series of such phases drawn (connected) in a sequence signifying a chronological order. To better illustrate this, consider the four-engine aircraft mentioned previously. Assume that when a critical failure (system failure) occurs during taxiing, the airplane does not take off and is sent for maintenance instead. However, when a critical failure occurs during take-off, cruising, landing, the system is assumed to be lost. Furthermore, assume that the taxi phase requires only one engine, the take-off phase requires all four engines, the cruising phase requires any three of the four engines and the landing phase requires any two of the four engines. To model this, each one of these cases would require a different k-out-of-n redundancy on the engines and thus a different RBD. Creating an RBD for each phase is trivial. However, what you need is a way to transition from one RBD to the next, in a specified sequence, while maintaining all the past history of each component during the transition.

In other words, a new engine would transition to the take-off phase with an age equal to the time it was used during taxi, or an engine that failed while in flight would remain failed in the next phase (i.e., landing). To model this, a block for each phase would be used in the phase diagram, and each phase block would be linked to the appropriate RBD. This is illustrated in the figure below. In this figure, the taxiing, take-off, cruising and landing blocks represent the operational phases and the final block is a maintenance phase. Each of the operational phases in this diagram has two paths leading from it: a success path and a failure path. This graphically illustrates the consequences in each case. For instance, if the first taxiing phase is successful, the airplane will proceed to the take-off phase; if it is unsuccessful, the airplane will be sent for maintenance. The failure paths for the take-off, cruising and landing phases point to stop blocks, which indicate that the simulation of the mission ends. For the final taxiing phase, both the success and failure paths lead to the maintenance path; the node block allows you to model this type of shared outcome.

The next several sections discuss different types of phase blocks: operational phase blocks, maintenance phase blocks, subdiagram phase blocks, node blocks and stop blocks.



= Types of Phases =

Operational Phase
In RPDs, two types of phases are used: operational phases and maintenance phases. An operational phase is used to represent any stage of the system's mission that is not exclusively dedicated to the execution of maintenance tasks. Operational phases are always defined by (linked to) an RBD. Each operational phase has a fixed, predefined time duration.

Maintenance Phase
A maintenance phase represents the portion of a system's mission time where the system is down and maintenance actions are performed on some or all of its components. For representation ease a maintenance phase is defined by (linked to) a maintenance template. This template can be thought of as a list, or a collection, of the specific components (blocks) that are designated to undergo inspection, repair or replacement actions during the maintenance phase, along with their maintenance priority order. In other words, if blocks A, B and C are to undergo maintenance during a specific phase, they are placed in a maintenance template in a priority sequence. Depending on the resources available, the actions are prioritized as resources permit. That is, if three repair crews were available along with three spare parts, actions on A, B and C would be carried out simultaneously. However, if only one crew was available, the actions would be carried out based on the priority order defined in the template. Given that all aspects of maintenance can be probabilistically defined, the duration of a maintenance phase, unlike an operational phase, is not fixed and the phase lasts as long as it takes to complete all actions specified in the phase. To illustrate this, consider a race car that competes in two races, and even though corrective repair actions can be done during each race as needed, the race car then undergoes a major overhaul (i.e., series of maintenance actions). For this example assume the major sub-systems of the car undergoing these maintenance tasks are the engine, the transmission, the suspension system and the tires. The operation of the race car can then be represented as a phase diagram consisting of two operational phases, representing the two races, and one maintenance phase representing the maintenance activities. The figure below shows such a phase diagram along with the maintenance template.

= The Success and Failure Path =

For a failure path, on system failure, the system goes to somewhere immediately when the failure occurs. For a success path, if there is no system failure during this phase, the system goes to somewhere by the end of the current phase.

In BlockSim 7, there was only one path from each phase. The failure outcome for each phase was defined via a drop-down list. The success and failure paths used in BlockSim 8 make it easy to see what will happen upon success or failure for each block. They also allow you to create more complex phase diagrams, in which success and failure may lead to entirely different sets of phases. In addition, they offer an additional possible outcome of failure. Previously, the outcome of failure could be maintenance, stopping the simulation or continuing the simulation; now another possible outcome of failure can be continuing simulation on a different path.

Examples
= Node Blocks and Stop Blocks =

In BlockSim 8, the two possible outcomes of an operational phase block are modeled using success and failure paths. Where previously a failure outcome was defined as part of the operational phase block's properties, it is now graphically represented within the diagram. Node blocks and stop blocks are provided in version 8 to allow you to build configurations that are both accurate and readable.

Node Blocks
The purpose of a node block is simply to enable configurations that would otherwise not be possible due to limitations on connecting blocks. For example, consider an instance where maintenance is scheduled to be performed after the operational phase has completed successfully, and if a failure occurs during simulation, that maintenance will take place upon failure. In this case, the operational phase block's success and failure outcomes are identical. Success paths and failure paths cannot be identical in phase diagrams, however, so you would model this configuration in one of two ways:


 * If the operational phase stops upon failure of the block and the simulation moves to the next phase along the success path, you would use a node block to model this configuration, as shown next.


 * If the operational phase continues for the specified duration despite failure and the simulation then moves to the next phase along the success path, you would simply not create a failure path.
 * If there is only one path, the success path observed for a phase, then on system failure at this phase, the "continue simulation" rule of BlockSim 7 applies. Under "continue simulation," when a system failure occurs, repairs begin as per the repair policy selected and the time to restore the system is part of the operational phase's time. In other words, the repairs continue in the operational phase until the system is up again. If the repairs are not completed before the phase ends, the repairs continue into the next phase. Thus, under this rule the duration of an operational phase is not affected by a system failure. As an example of this rule, consider a production line operating in two phases: a day shift and a night shift. A failure occurs in the day shift that renders the production line non-operational. Repair of the production line begins immediately and continues beyond the day shift. The production line is back up after midnight. In this case, the repair of the production line exhausts all of the duration of the day shift phase from the time of the failure to the end of the phase. Some part of the night shift phase is also exhausted.

Node blocks can have unlimited incoming connections and a single outgoing connection.

Stop Blocks
Stop blocks indicate that the simulation of the mission ends. A new simulation may then begin, if applicable. This is useful in situations where maintenance is not possible upon failure.

Stop blocks can have unlimited incoming connections. No outgoing connections can be defined for stop blocks.

When a path leads to a stop node, it is the same as the option "Start New Simulation" in BlockSim 7, which would halt the simulation and effectively means the end of the mission if the system fails. Specifically, if a failure path leads to a stop node the execution of the current operational phase and all phases that follow the current phase is halted, and the mission aborted. The stop node can be used to model a system whose failure cannot be repaired and the mission has to be aborted if a failure occurs. A good example of this would be the aircraft case discussed previously. A catastrophic failure during cruising would end the mission.

= Subdiagram Phase Blocks =

Subdiagram phase blocks represent other phase diagrams within the project. Using subdiagram phase blocks allows you to incorporate phase diagrams as phases within other phase diagrams. This allows you to break down extremely complex configurations into smaller diagrams, increasing understandability and ease of use and avoiding unnecessary repetition of elements.

Subdiagram phase blocks can have unlimited incoming connections and up to two outgoing connections, which may include one success path and one failure path. The success path and the failure path must be different; if both success and failure of the block actually lead to the same outcome, you can use a node block to model this configuration.

Example
= Cycles and Phase Diagram Execution =

The execution of a phase diagram from its first phase to its last phase is referred to as one cycle. If the simulation end time exceeds the total duration of one cycle of a phase diagram, the simulation continues and the phase diagram is executed multiple times until the simulation end time is reached. Execution of a phase diagram multiple times during a simulation is referred to as cycling. During cycling, the age of components accumulated in the last phase of the previous cycle is carried over to the first phase of the next cycle. The principle of cumulative damage is used to transfer the age across phases for each component (block). For more discussion on this see Age Transfer Across Phases Using Cumulative Damage later on in this chapter. In summary, cycling is used to model the continuous operation of a system involving repetition of the same phases in the same sequence.

= Working with Phase Diagrams =

To allow modeling flexibility, a number of options can be specified for both operational and maintenance phases. The operational phase properties and maintenance phase properties are discussed in the sections that follow. An additional operational phase property, Phase Throughput, is discussed later in the Phase Throughput section.

= Operational Phase Properties =

Diagram
The diagram property is used to associate an RBD with a phase. You can select and associate any existing simulation RBD with a phase. Note that common components across different RBDs are identified by name. In other words, a component with the exact same name in two RBDs is assumed to be the same component working in two different phases.

Phase Duration
The duration of an operational phase is fixed and needs to be specified. However, this duration may be affected by the choice of path you choose followed by this phase. If a failure has not occurred by the end of the specified phase duration, the simulation will proceed along the success path leading from the phase block. If a failure occurs, the simulation will proceed along the failure path leading from the phase block.

Phase Duty Cycle
This property allows you to specify a common duty cycle value for the entire RBD that the phase represents, thereby modeling situations where the actual usage of the RBD during system operation is not identical to the usage for which you have data (either from testing or from the field). This can include situations where the item:


 * Does not operate continuously (e.g., a DVD drive that was tested in continuous operation, but in actual use within a computer accumulates only 18 minutes of usage for every hour the computer operates).
 * Is subjected to loads that are greater than or less than the rated loads (e.g., a motor that is rated to operate at 1,000 rpm but is being used at 800 rpm).
 * Is affected by changes in environmental stress (e.g., a laptop computer that is typically used indoors at room temperature, but is being used outdoors in tropical conditions).

In these cases, continuous operation at the rated load is considered to be a duty cycle of 1. Any other level of usage is expressed as a percentage of the rated load value or operating time. For example, consider the DVD drive mentioned above; its duty cycle value would be 18 min / 60 min = 0.3. A duty cycle value higher than 1 indicates a load in excess of the rated value.

If a duty cycle is specified for the phase and there are also duty cycles specified for blocks within the RBD, their effects are compounded. For instance, consider the aircraft example given earlier. During the take-off phase, the subsystems experience 1.5 times the normal stress, so you would use a phase duty cycle value of 1.5. We also know that the landing gear is not used continuously during take-off. Assume that the landing gear is actually in use only 30% of the time during take-off. Each landing gear block in the RBD, then, would have a duty cycle value of 0.3. For each block, the effects of the phase duty cycle and the block duty cycle are compounded, yielding an effective duty cycle value of 1.5 x 0.3 = 0.45.

= Rules and Assumptions =

When Transferring Interrupted Maintenance Tasks Across Phases
Maintenance tasks in progress during one operational phase can be interrupted if that phase ends before the repair is completed. For example, a crew delay or spare parts order may extend the duration of a repair beyond the duration of the phase. As described next, the software handles these interruptions differently, based on the stage in which the repair was interrupted and whether or not the failed block is present in the next contiguous phase.


 * 1. If a phase ends during the repair of a failed block and the block is present in the next contiguous phase:
 * a) If the same task is present in both phases, then the task will continue as-is in the next phase. This is considered an uninterrupted event, and counts as a single unique event at both the block and the system level.
 * b) If the interrupted task is not used in the next phase, then the task is cancelled and new tasks are applied as needed. In this case, all crew calls are cancelled and spare parts are restocked.
 * 1) If the repair has started or the crew is delayed (crew logistic delay), the call will be assumed accepted and the component will be charged for it. If the crew was occupied with another component’s repair, the call will be assumed rejected and hence not charged to the component.
 * 2) If the call for spare parts incurred emergency charges, those are charged to the block; otherwise, there are no other charges to the block.


 * 2. If a phase ends during the repair of a failed block and the block is not present in the next contiguous phase, then the task is cancelled and new tasks are applied as needed. All crew calls are cancelled and spare parts are restocked:
 * a) If the repair has started or the crew is delayed (crew logistic delay), the call will be assumed accepted and the component will be charged for it. If the crew was occupied with another component’s repair, the call will be assumed rejected and hence not charged to the component.
 * b) If the call for spare parts incurred emergency charges, those are charged to the block; otherwise, there are no other charges to the block.
 * c) Discontinuous events are counted as two distinct events at both the block and the system level.
 * d) When the system fails in a phase that has a failure path leading to a stop block, the system will remain down for the remainder of the simulation. From that point on, the blocks that are down are assumed unavailable and the blocks that are up are assumed operational for availability calculations.

For Stop Blocks
When a system failure occurs in a phase where the failure path points to a stop block, the simulation is aborted. Once this failure occurs, the following assumptions apply to the results:


 * • Components that are under repair or maintenance remain down and unavailable for the rest of the simulation.
 * • Components that are operating remain up for the rest of the simulation.

For a Maintenance Phase
A system is considered down and unavailable during the execution of a maintenance phase and remains down until all components have been repaired or maintained according to the properties specified for the maintenance phase. A maintenance phase is executed when the simulation reaches the phase while progressing through the phase diagram, either following a success path or a failure path. The following assumptions apply to both cases.


 * 1. When a component enters a maintenance phase in a down state, the following rules apply:
 * a) If a task is in progress for this component, the event will transfer to the maintenance phase provided that the same task is present in the maintenance phase. The rules for interrupted tasks apply as noted above.
 * b) If the component is failed but no corrective maintenance is in progress (either because the component was non-repairable in the phase where it failed or because it had a task scheduled to be executed upon inspection and was waiting for an inspection), a repair is initiated according to the corrective maintenance properties specified for the component in the maintenance phase.
 * c) Failed components are fixed in the order in which they failed.
 * 2. When a component enters a maintenance phase in an operating state, the following rules apply:
 * a) Maintenance will be scheduled as follows:
 * 1) Tasks based on intervals or upon start of a maintenance phase
 * 2) Tasks based on events in a maintenance group, where the triggering event applies to a block
 * 3) Tasks based on system down
 * 4) Tasks based on events in a maintenance group, where the triggering event applies to a subdiagram
 * Within these categories, order is determined according to the priorities specified in the maintenance template (i.e., the higher the task is on the list, the higher the priority).


 * b) An inspection or preventive task may be initiated, if applicable, with inspections taking precedence over preventive tasks. Inspections and/or preventive tasks are initiated if one of the following applies:
 * 1) Upon certain events:
 * a) The task is set to be performed when a maintenance phase starts.
 * b) The policy is set to be performed based on events in a maintenance group and one of those events occurs within the one of the specified maintenance groups. Note that such a triggered maintenance does not follow the priorities specified in the maintenance template, but is sent to the end of the queue for repair.
 * c) The task is set to be performed whenever the system is down.
 * 2) At certain intervals:
 * a) The task is set to be performed at a fixed time interval, based on either item age or calendar time, and the maintenance falls within the maintenance threshold specified in the maintenance phase.


 * If the inspection task is not set to bring either the item or the system down, the inspection will still be considered a downing inspection.

Finally, if a block enters a maintenance phase in a failed state:


 * 1. If the block does not have a corrective task in the maintenance phase but does have an on condition task, the preventive portion of the on condition task is triggered immediately in order to restore the block.
 * 2. A maintenance phase will not end until all components are restored. Therefore, if any failed block does not have a task that restores it, the maintenance phase will not end.

= Maintenance Phase Properties =

Maintenance Template
This specifies the maintenance template to be used in the currently selected maintenance phase.

Interval Maintenance Threshold
The Interval Maintenance Threshold property provides the ability to add some flexibility to the timing of scheduled maintenance tasks. In other words, tasks based on system or item age intervals (fixed or dynamic) will be performed if the start of the maintenance phase is within (1-X)% of the scheduled time for the action. It is used to specify an age interval when a maintenance task will be performed. This helps in optimizing the resources allocated to repair the system during a maintenance phase by performing preventive maintenance actions or inspections when the system is already down in a maintenance phase. For example, a preventive maintenance action is scheduled for a car (e.g., an oil change, tire rotation, etc.) every 60,000 miles, but a system downing failure of an unrelated component occurs at 55,000 miles. Here the system age threshold will determine whether the preventive maintenance will be performed earlier than scheduled. If the Interval Maintenance Threshold is 0.9, the preventive maintenance will be performed since the failure occurred after the system accumulated 91.67% of the time to the scheduled maintenance or is within 8.33%, (60,000-55,000)/60,000= 8.33%, of the system age at which the preventive maintenance was originally scheduled. If the system age threshold was 0.95, the preventive maintenance will not be performed at 55,000 miles, since the system failure did not occur within 5% of the system age at which the preventive maintenance was originally scheduled (1-0.95=0.05 or 5%).

Example
To illustrate Interval Maintenance Threshold, consider a system that has two components: Block A and Block B. The system undertakes a mission that can be divided into two phases. The first phase is an operational phase with a duration of 1,370 hours, with both the components in a parallel configuration. In this phase Block A fails every 750 hours while Block B fails every 1,300 hours. Corrective maintenance on Block A in this phase requires 100 hours to be completed. A preventive maintenance of 20 hours duration also occurs on Block A every 500 hours. No maintenance can be carried out on Block B in this phase. The second phase of the mission is a maintenance phase. In this phase Block A has the same maintenance actions as those in the first phase. A corrective maintenance of 100 hours duration is defined for Block B. Phase 2 also has a value of 0.70 set for the Interval Maintenance Threshold. All maintenance actions during the entire mission of the system have a type II restoration factor of 1.







The system behavior from 0 to 3500 hours is shown in the plot below and described next.


 * Phase 1 begins at time 0 hours. The duration of this phase is 1,370 hours.
 * At 500 hours, the first of the scheduled PMs for Block A begins. The duration of these maintenance tasks is 20 hours. The scheduled maintenance ends at 520 hours.
 * At 1,000 hours, another PM occurs for Block A based on the set policy. This maintenance ends at 1,020 hours.
 * At 1,300 hours, Block B fails after accumulating an age of 1,300 hours. A system failure does not occur as Block B is in a parallel configuration with Block A in this phase. Repairs for Block B are not defined in this phase. As a result Block B remains in a failed state.
 * At 1,370 hours, phase 1 ends and phase 2 begins. Phase 2 is a maintenance phase. Block B is repairable in this phase and has a CM duration of 100 hours. As a result, repairs on Block B begin and are completed at 1,470 hours. Block A has the next PM scheduled to occur at 1,500 hours.

However, phase 2 has a Interval Maintenance Threshold for Preventive and Inspection Policies of 0.7. The time remaining to the next PM is 130 hours (1,500-1,370 = 130 hours). This remaining time over the PM policy time of 500 hours is 26%. This is within 0.3 (1-0.70 = 0.3) or 30% corresponding to the threshold value of 0.70. Thus the PM task that is to occur at 1,500 hours is carried out in the maintenance phase from 1,370 hours to 1,390 hours while no PM occurs at 1,500 hours. All maintenance actions are completed by 1,470 hours and phase 2 ends at this time. This completes the first cycle of operation for the phase diagram.


 * At 1,470 hours, phase 1 begins in the second cycle.
 * At 2,000 hours, the next PM for Block A begins. This maintenance ends at 2,020 hours.
 * At 2,500 hours, another PM is carried out on Block A and is completed by 2,520 hours.
 * At 2,770 hours, Block B fails in the second cycle of phase 1 after accumulating an age of 1,300 hours. Since no repair is defined for the block in this phase, it remains in a failed state.
 * At 2,840 hours, phase 1 completes its duration of 1,370 hours and ends. Phase 2 begins in the second cycle and the corrective maintenance, defined for a duration of 100 hours for Block B, begins. This repair action ends at 2,940 hours. For Block A the time remaining until the next PM at 3,000 hours is 160 hours (3000-2840 = 160 hours). This remaining time over the PM policy of 500 hours is 32%. This is not within 30% corresponding to the threshold value of 0.70. Thus the PM due at 3,000 hours is not considered close enough to the beginning of the maintenance phase and is not carried out in this phase. At 2,940 hours, all maintenance actions are completed in phase 2 and phase 2 ends. This also completes the second cycle of operation for the phase diagram.
 * At 2,940 hours, phase 1 begins in the third cycle.
 * At 3,000 hours, the scheduled maintenance on Block A occurs. This PM ends at 3,020 hours.
 * At 3,500 hours, the simulation ends.



= Phase Throughput =

Phase throughput is the maximum number of items that a system can process during a particular phase. It is defined at the phase level as a phase property in an operational phase. For a detailed discussion of throughput at the block level see [|Throughput Analysis]. Phase throughput can be thought of as the initial throughput that enters the system. For example, imagine a textile factory that receives different quantities of raw materials during different seasons. These seasons could be treated as different phases. In this case a phase may be seen as sending a certain quantity of units to the first component of the system (the textile factory in this case). Depending on the capacity and availability of the factory, these units may be all processed or a backlog may accumulate.

Alternatively, phase throughput can be used as a constraint to the throughput of the system. An example would be the start up period in a processing plant. When the plant stops operating, the equipment requires a warm up period before reaching its maximum production capacity. In this case the phase throughput may be used to limit the capacity of the first component which in turn would limit the throughput of the rest of the system. Note that there is no phase-related backlog for this example. In BlockSim this can be modeled by checking the ignore backlog option (see Throughput Analysis Options - Block Settings) in the block properties window for the first component. Both constant and variable throughput can be used in a phase. The methodology used for variable throughput is discussed next, and examples using both constant and variable throughput follow.

Variable (Time-Varying) Throughput
Time-varying throughput can be specified at the phase level through the Variable Throughput property of an operational phase. Variable throughput permits modeling of scenarios where the throughput changes over time. For variable throughput three general models are available in BlockSim. Each of these models has two parameters a and b which are specified by the user. These models are discussed below:


 * 1. Linear model:


 * $$\begin{align}

y=ax+b \end{align}$$

This model describes the change in throughput y as a linear function of time x. Throughput processed between any two points of time x1 and x2 is obtained by integration of the linear function as:


 * $$\begin{align}

\text{Linearly varying throughput}= & \mathop{}_^ydx \\ = & \mathop{}_^(ax+b)dx \\ = & \frac{a}{2}(x_{2}^{2}-x_{1}^{2})+b({{x}_{2}}-{{x}_{1}}) \end{align}$$


 * 2. Exponential model:


 * $$\begin{align}

y={be}^{ax} \end{align}$$

This model describes the change in throughput y as an exponential function of time x. Throughput processed in a period of time between any two points x1 and x2 is obtained as:


 * $$\begin{align}

\text{Exponentially varying throughput}= & \mathop{}_^ydx \\ = & \mathop{}_^b{{e}^{ax}}dx \\ = & \text{ }\frac{b}{a}({{e}^{a{{x}_{2}}}}-{{e}^{a{{x}_{1}}}}) \end{align}$$
 * 3. Power model:


 * $$\begin{align}

y={bx}^{a} \end{align}$$

This model describes the change in throughput y as a power function of time x. Throughput processed between two points of time x1 and x2 is obtained as:


 * $$\begin{align}

\text{Power varying throughput}= & \mathop{}_^ydx \\ = & \mathop{}_^b{{x}^{a}}dx \\ = & \frac{b}{a+1}(x_{2}^{a+1}-x_{1}^{a+1}) \end{align}$$

All of the above models also have a user defined maximum throughput capacity value. Once this maximum throughput capacity value is reached, the throughput per unit time becomes constant and equal in value to the maximum throughput capacity specified by the user. In this situation the variable throughput model would then act as a constant throughput model. The above models may at first glance seem limited, when in fact they do provide ample modeling flexibility. This flexibility is achieved by using these functions as building blocks for more complex functions. As an example, a step model can be easily created by using multiple phases, each with a constant throughput. A ramp model would use phases with linearly increasing functions in conjunction with constant phases, and so forth.

= More Examples =