Reliability Growth Planning: Difference between revisions

From ReliaWiki
Jump to navigation Jump to search
(Updated for V11)
 
Line 1: Line 1:
{{template:RGA BOOK|4|Reliability Growth Planning}}
{{template:RGA BOOK|4|Reliability Growth Planning}}


In developmental reliability growth testing, the objective is to test a system, find problem failure modes, and at some point, incorporate corrective actions in order to increase the reliability of the system. This process is continued for the duration of the test time. If the corrective actions are effective, then the system mean time between failures (MTBF) will move from an initial low value to a higher value. Typically, the objective of reliability growth testing is not just to increase the MTBF, but to increase it to a particular value. In other words, we want to increase the MTBF to meet a specific goal or requirement. Therefore, determining how much test time is needed for a particular system is generally always of interest in reliability growth testing. This brings us to the topic of growth planning.
==Introduction==


In this chapter, we will address the common questions engineers and managers face at the planning stage of product development, in terms of meeting the design reliability goals. Planning for required test time, setting realistic reliability goals and creating a strategy for meeting those goals are all essential parts of project management. It is therefore necessary to have a tool that can be utilized for such planning purposes. The basis of the tool will be the Duane postulate. A model that incorporates the realities faced during development and generates a plan that can be used as a guideline during developmental testing will be developed.
In developmental reliability growth testing, the objective is to test a system, find problem failure modes, incorporate corrective actions and therefore increase the reliability of the system. This process is continued for the duration of the test time. If the corrective actions are effective then the system mean time between failures (MTBF) or mean trials between failures (MTrBF) will move from an initial low value to a higher value. Typically, the objective of reliability growth testing is not to just increase the MTBF/MTrBF, but to increase it to a particular value called the goal or requirement. Therefore, determining how much test time is needed for a particular system is generally of particular interest in reliability growth testing.


==Background==
The [[Duane Model|Duane postulate]] is based on empirical observations, and it reflects a learning curve pattern for reliability growth. This learning curve pattern forms the basis of the [[Crow-AMSAA (NHPP)|Crow-AMSAA (NHPP) model]]. The Duane postulate is also reflected in the [[Crow Extended]] model in the form of the discovery function <math>h(t)\,\!</math>.
The [[Duane Model|Duane postulate]] is based on empirical observations, and it reflects a learning curve pattern for reliability growth. This identical learning curve pattern forms the basis of the [[Crow-AMSAA (NHPP)|Crow-AMSAA (NHPP) model]]. The Duane postulate is also reflected in the [[Crow Extended|Crow extended]] model in the form of the discovery function <math>h(t)\,\!</math>.


The discovery function is the rate in which new, distinct problems are being discovered during reliability growth development testing. The Crow-AMSAA (NHPP) model is a special case of the discovery function. Consider that when a new and distinct failure mode is first seen, the testing is stopped and a corrective action is incorporated before the testing is resumed. In addition, suppose that the corrective action is highly effective that the failure mode is unlikely to be seen again. In this case, the only failures observed during the reliability growth test are the first occurrences of the failure modes. Therefore, if the Crow-AMSAA (NHPP) model and the Duane postulate are accepted as the pattern for a test-fix-test reliability growth testing program, then the form of the Crow-AMSAA (NHPP) model must be the form for the discovery function, <math>h(t)\,\!</math>.
The discovery function is the rate in which new, distinct problems are being discovered during reliability growth development testing. The Crow-AMSAA (NHPP) model is a special case of the discovery function. Consider that when a new and distinct failure mode is first seen, the testing is stopped and a corrective action is incorporated before the testing is resumed. In addition, suppose that the corrective action is highly effective that the failure mode is unlikely to be seen again. In this case, the only failures observed during the reliability growth test are the first occurrences of the failure modes. Therefore, if the Crow-AMSAA (NHPP) model and the Duane postulate are accepted as the pattern for a test-fix-test reliability growth testing program, then the form of the Crow-AMSAA (NHPP) model must be the form for the discovery function, <math>h(t)\,\!</math>.
Line 12: Line 11:
To be consistent with the Duane postulate and the Crow-AMSAA (NHPP) model, the discovery function must be of the same form. This form of the discovery function is an important property of the Crow extended model and its application in growth planning. As with the Crow-AMSAA (NHPP) model, this form of the discovery function ties the model directly to real-world data and experiences.
To be consistent with the Duane postulate and the Crow-AMSAA (NHPP) model, the discovery function must be of the same form. This form of the discovery function is an important property of the Crow extended model and its application in growth planning. As with the Crow-AMSAA (NHPP) model, this form of the discovery function ties the model directly to real-world data and experiences.


The use of the Duane postulate as a reliability growth planning model poses two significant drawbacks: The first drawback is that the Duane postulate's MTBF is zero at time equal to zero. This was addressed in MIL-HDBK-189 by specifying a time <math>{{T}_{i}}\,\!</math> where growth starts after <math>{{T}_{i}}\,\!</math> and the Duane postulate applies [[RGA_References|[13]]]. However, determining <math>{{T}_{i}}\,\!</math> is subjective and is not a desirable property of the MIL-HDBK-189. The second drawback is that the MTBF for the Duane postulate increases indefinitely to infinity, which is not realistic.
==Growth Planning Models==
There are two types of reliability growth planning models available in RGA:


Therefore, the desirable features of a planning model are:
*[[Continuous Reliability Growth Planning|Continuous Reliability Growth Planning]]


#The discovery function must have the form of the Crow-AMSAA (NHPP) model and the Duane postulate.
*[[Discrete Reliability Growth Planning|Discrete Reliability Growth Planning]]
#The start time <math>{{T}_{i}}\,\!</math> is not required as an input.
#An upper bound on the system MTBF is specified in the model.
All of these desirable features are included in the planning model discussed in this chapter, which is based on the Crow extended model.
 
The Crow extended model for reliability growth planning is a revised and improved version of the MIL-HDBK-189 growth curve [[RGA_References|[13]]]. MIL-HDBK-189 can be considered as the growth curve based on the Crow-AMSAA (NHPP) model. Using MIL-HDBK-189 for reliability growth planning assumes that the corrective actions for the observed failure modes are incorporated during the test and at the specific time of failure. However, in actual practice, fixes may be delayed until after the completion of the test or some fixes may be implemented during the test while others are delayed and some are not fixed at all. The Crow extended model for reliability growth planning provides additional inputs that accounts for specific management strategies and delayed fixes with specified effectiveness factors.


==Growth Planning Inputs==
==Growth Planning Inputs==
The following parameters are used in both the continuous and discrete reliability growth models.


===Management Strategy Ratio & Initial Failure Intensity===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
===Management Strategy Ratio & Initial Failure Intensity===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
When a system is tested and failure modes are observed, management can make one of two possible decisions, either to fix or to not fix the failure mode. Therefore, the management strategy places failure modes into two categories: A modes and B modes. The A modes are all failure modes such that, when seen during the test, no corrective action will be taken. This accounts for all modes for which management determines that it is not economical or otherwise justified to take a corrective action. The B modes are either corrected during the test or the corrective action is delayed to a later time. The management strategy is defined by what portion of the failures will be fixed.
When a system is tested and failure modes are observed, management can make one of two possible decisions, either to fix or to not fix the failure mode. Therefore, the management strategy places failure modes into two categories: A modes and B modes. The A modes are all failure modes such that, when seen during the test, no corrective action will be taken. This accounts for all modes for which management determines to be not economical or otherwise justified to take a corrective action. The B modes are either corrected during the test or the corrective action is delayed to a later time. The management strategy is defined by what portion of the failures will be fixed.


Let <math>{{\lambda }_{I}}\,\!</math> be the initial failure intensity of the system in test. <math>{{\lambda }_{A}}\,\!</math> is defined as the A mode's initial failure intensity and <math>{{\lambda }_{B}}\,\!</math> is defined as the B mode's initial failure intensity. <math>{{\lambda }_{A}}\,\!</math> is the failure intensity of the system that will not be addressed by corrective actions even if a failure mode is seen during test. <math>{{\lambda }_{B}}\,\!</math> is the failure intensity of the system that will be addressed by corrective actions if a failure mode is seen during testing.
Let <math>{{\lambda }_{I}}\,\!</math> be the initial failure intensity of the system in test. <math>{{\lambda }_{A}}\,\!</math> is defined as the A mode's initial failure intensity and <math>{{\lambda }_{B}}\,\!</math> is defined as the B mode's initial failure intensity. <math>{{\lambda }_{A}}\,\!</math> is the failure intensity of the system that will not be addressed by corrective actions even if a failure mode is seen during testing. <math>{{\lambda }_{B}}\,\!</math> is the failure intensity of the system that will be addressed by corrective actions if a failure mode is seen during testing.


Then, the initial failure intensity of the system is:
Then, the initial failure intensity of the system is:
Line 56: Line 52:
When a delayed corrective action is implemented for a type B failure mode, in other words a BD mode, the failure intensity for that mode is reduced if the corrective action is effective. Once a BD mode is discovered, it is rarely totally eliminated by a corrective action. After a BD mode has been found and fixed, a certain percentage of the failure intensity will be removed, but a certain percentage of the failure intensity will generally remain. The fraction decrease in the BD mode failure intensity due to corrective actions, <math>d\,\!</math>, <math>\left( 0<d<1 \right),\,\!</math> is called the ''effectiveness factor'' (EF).  
When a delayed corrective action is implemented for a type B failure mode, in other words a BD mode, the failure intensity for that mode is reduced if the corrective action is effective. Once a BD mode is discovered, it is rarely totally eliminated by a corrective action. After a BD mode has been found and fixed, a certain percentage of the failure intensity will be removed, but a certain percentage of the failure intensity will generally remain. The fraction decrease in the BD mode failure intensity due to corrective actions, <math>d\,\!</math>, <math>\left( 0<d<1 \right),\,\!</math> is called the ''effectiveness factor'' (EF).  


A study on EFs showed that an average EF, <math>d\,\!</math>, is about 70%. Therefore, about 30%, (i.e., <math>100(1-d)%\,\!</math>), of the BD mode failure intensity will typically remain in the system after all of the corrective actions have been implemented. However, individual EFs for the failure modes may be larger or smaller than the average. This average value of 70% can be used for planning purposes, or if such information is recorded, an average effectiveness factor from a previous reliability growth program can be used.
A study on EFs showed that an average EF, <math>d\,\!</math>, is about 70%. Therefore, about 30% (i.e., <math>100(1-d)%\,\!</math>) of the BD mode failure intensity will typically remain in the system after all of the corrective actions have been implemented. However, individual EFs for the failure modes may be larger or smaller than the average. This average value of 70% can be used for planning purposes, or if such information is recorded, an average effectiveness factor from a previous reliability growth program can be used.


===MTBF Goal===
===MTBF Goal===
When putting together a reliability growth plan, a goal MTBF <math>{{M}_{G}}\,\!</math> (or goal failure intensity <math>{{\lambda }_{G}}\,\!</math> ) is defined as the requirement or target for the product at the end of the growth program.
When putting together a reliability growth plan, a goal MTBF/MTrBF <math>{{M}_{G}}\,\!</math> (or goal failure intensity <math>{{\lambda }_{G}}\,\!</math> ) is defined as the requirement or target for the product at the end of the growth program.


===Growth Potential===
===Growth Potential===
The failure intensity remaining in the system at the end of the test will depend on the management strategy given by the classification of the type A and type B failure modes. The engineering effort applied to the corrective actions determines the effectiveness factors. In addition, the failure intensity depends on <math>h(t)\,\!</math>, which is the rate at which problem failure modes are being discovered during testing. The rate of discovery drives the opportunity to take corrective actions based on the seen failure modes, and it is an important factor in the overall reliability growth rate. The reliability growth potential is the limiting value of the failure intensity as time <math>T\,\!</math> increases. This limit is the maximum MTBF that can be attained with the current management strategy. The maximum MTBF will be attained when all type B modes have been observed and fixed.
The failure intensity remaining in the system at the end of the test will depend on the management strategy given by the classification of the type A and type B failure modes. The engineering effort applied to the corrective actions determines the effectiveness factors. In addition, the failure intensity depends on <math>h(t)\,\!</math>, which is the rate at which problem failure modes are being discovered during testing. The rate of discovery drives the opportunity to take corrective actions based on the seen failure modes, and it is an important factor in the overall reliability growth rate. The reliability growth potential is the limiting value of the failure intensity as time <math>T\,\!</math> increases. This limit is the maximum MTBF that can be attained with the current management strategy. The maximum MTBF/MTrBF will be attained when all type B modes have been observed and fixed.


If all the discovered type B modes are corrected by time <math>T\,\!</math>, that is, no deferred corrective actions at time <math>T\,\!</math>, then the growth potential is the maximum attainable with the type B designation of the failure modes and the corresponding assigned effectiveness factors. This is called the ''nominal growth potential''. In other words, the nominal growth potential is the maximum attainable growth potential assuming corrective actions are implemented for every mode that is planned to be fixed. In reality, some corrective actions might be implemented at a later time due to schedule, budget, engineering, etc.
If all the discovered type B modes are corrected by time <math>T\,\!</math>, that is, no deferred corrective actions at time <math>T\,\!</math>, then the growth potential is the maximum attainable with the type B designation of the failure modes and the corresponding assigned effectiveness factors. This is called the ''nominal growth potential''. In other words, the nominal growth potential is the maximum attainable growth potential assuming corrective actions are implemented for every mode that is planned to be fixed. In reality, some corrective actions might be implemented at a later time due to schedule, budget, engineering, etc.
Line 72: Line 68:
:<math>{{\lambda }_{GP}}={{\lambda }_{A}}+\left( 1-d \right){{\lambda }_{B}}\,\!</math>
:<math>{{\lambda }_{GP}}={{\lambda }_{A}}+\left( 1-d \right){{\lambda }_{B}}\,\!</math>


The growth potential MTBF is:
The growth potential MTBF/MTrBF is:


:<math>{{M}_{GP}}=\frac{1}{{{\lambda }_{GP}}}\,\!</math>
:<math>{{M}_{GP}}=\frac{1}{{{\lambda }_{GP}}}\,\!</math>
Line 81: Line 77:


===Growth Potential Design Margin===
===Growth Potential Design Margin===
The Growth Potential Design Margin ( <math>GPDM\,\!</math> ) can be considered as a safety margin when setting target MTBF values for the reliability growth plan. It is common for systems to degrade in terms of reliability when a prototype product is going into full manufacturing. This is due to variations in materials, processes, etc. Furthermore, the in-house reliability growth testing usually overestimates the actual product reliability because the field usage conditions may not be perfectly simulated during testing. Typical values for the <math>GPDM\,\!</math> are around 1.2. Higher values yield less risk for the program, but require a more rigorous reliability growth test plan. Lower values imply higher program risk, with less safety margin.
The Growth Potential Design Margin ( <math>GPDM\,\!</math> ) can be considered as a safety margin when setting target MTBF/MTrBF values for the reliability growth plan. It is common for systems to degrade in terms of reliability when a prototype product is going into full manufacturing. This is due to variations in materials, processes, etc. Furthermore, the in-house reliability growth testing usually overestimates the actual product reliability because the field usage conditions may not be perfectly simulated during testing. Typical values for the <math>GPDM\,\!</math> are around 1.2. Higher values yield less risk for the program, but require a more rigorous reliability growth test plan. Lower values imply higher program risk, with less safety margin.
   
   
During the planning stage, the growth potential MTBF, <math>{{M}_{GP}},\,\!</math> can be calculated based on the goal MTBF, <math>{{M}_{G}},\,\!</math> and the growth potential design margin, <math>GPDM\,\!</math>.
During the planning stage, the growth potential MTBF/MTrBF, <math>{{M}_{GP}},\,\!</math> can be calculated based on the goal MTBF, <math>{{M}_{G}},\,\!</math> and the growth potential design margin, <math>GPDM\,\!</math>.


:<math>{{M}_{GP}}=GPDM\cdot {{M}_{G}}\,\!</math>
:<math>{{M}_{GP}}=GPDM\cdot {{M}_{G}}\,\!</math>
Line 90: Line 86:


:<math>{{\lambda }_{GP}}=\frac{{{\lambda }_{G}}}{GPDM}\,\!</math>
:<math>{{\lambda }_{GP}}=\frac{{{\lambda }_{G}}}{GPDM}\,\!</math>
==Nominal Idealized Growth Curve==
During developmental testing, management should expect that certain levels of reliability will be attained at various points in the program in order to have assurance that reliability growth is progressing at a sufficient rate to meet the product reliability requirement. The idealized curve portrays an overall characteristic pattern, which is used to determine and evaluate intermediate levels of reliability and construct the program planned growth curve. Note that growth profiles on previously developed, similar systems provide significant insight into the reliability growth process and are valuable in the construction of idealized growth curves.
The nominal idealized growth curve portrays a general profile for reliability growth throughout system testing. The idealized curve has the baseline value <math>{{\lambda }_{I}}\,\!</math> until an initialization time, <math>{{t}_{0}},\,\!</math> when reliability growth occurs. From that time and until the end of testing, which can be a single or, most commonly, multiple test phases, the idealized curve increases steadily according to a learning curve pattern until it reaches the final reliability requirement, <math>{{M}_{F}}\,\!</math>. The slope of this curve on a log-log plot is the growth rate of the Crow extended model [[RGA_References|[13]]].
===Nominal Failure Intensity Function===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
The nominal idealized growth curve failure intensity as a function of test time <math>t\,\!</math> is:
:<math>{{r}_{NI}}(t)={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{t}^{\left( \beta -1 \right)}}\text{ for }t\ge {{t}_{0}}\,\!</math>
and:
:<math>{{r}_{NI}}(t)={{\lambda }_{I}}\text{ for }t\le {{t}_{0}}\,\!</math>
where <math>{{\lambda }_{I}}\,\!</math> is the initial system failure intensity, <math>t\,\!</math> is test time and <math>{{t}_{0}}\,\!</math> is the initialization time, which is discussed in the next section.
It can be seen that the first equation for <math>{{r}_{NI}}(t)\,\!</math> is the failure intensity equation of the Crow extended model.
===Initialization Time===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS
PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
Reliability growth can only begin after a type B failure mode occurs, which cannot be at a time equal to zero. Therefore, there is a need to define an initialization time that is different from zero. The nominal idealized growth curve failure intensity is initially set to be equal to the initial failure intensity, <math>{{\lambda }_{I}},\,\!</math> until the initialization time, <math>{{t}_{0}}\,\!</math> :
:<math>{{r}_{NI}}({{t}_{0}})={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta t_{0}^{(\beta -1)}\,\!</math>
Therefore:
:<math>{{\lambda }_{I}}={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta t_{0}^{(\beta -1)}\,\!</math>
Then:
:<math>{{t}_{0}}={{\left[ \frac{{{\lambda }_{I}}-{{\lambda }_{A}}-(1-d){{\lambda }_{B}}}{d\lambda \beta } \right]}^{\tfrac{1}{\beta -1}}}\,\!</math>
Using the equation for initial failure intensity:
:<math>\lambda_{I}=\lambda_{A} + \lambda_{B}\,\!</math>
we substitute <math>{{\lambda }_{I}}\,\!</math> to get:
:<math>{{t}_{0}}={{\left[ \frac{{{\lambda }_{A}}+{{\lambda }_{B}}-{{\lambda }_{A}}-(1-d){{\lambda }_{B}}}{d\cdot \lambda \cdot \beta } \right]}^{\tfrac{1}{\beta -1}}}\,\!</math>
Then:
:<math>{{t}_{0}}={{\left( \frac{{{\lambda }_{B}}}{\lambda \cdot \beta } \right)}^{\tfrac{1}{\beta -1}}}\,\!</math>
The initialization time, <math>{{t}_{0}},\,\!</math> allows for growth to start after a type B failure mode has occurred.
===Nominal Time to Reach Goal===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS
PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
Assuming that we have a target MTBF or failure intensity goal, we can solve the equation for the [[Reliability_Growth_Planning#Nominal_Failure_Intensity_Function|nominal failure intensity]] to find out how much test time, <math>{{t}_{N,G}}\,\!</math>, is required (based on the Crow extended model and the nominal idealized growth curve) to reach that goal:
:<math>{{t}_{N,G}}={{\left[ \frac{{{r}_{G}}-{{\lambda }_{A}}-(1-d){{\lambda }_{B}}}{d\cdot \lambda \cdot \beta } \right]}^{\tfrac{1}{\beta -1}}}\,\!</math>
Note that when <math>{{\lambda }_{I}}<{{r}_{G}}\,\!</math> or, in other words, the initial failure intensity is lower than the goal failure intensity, then there is no need to solve for the nominal time to reach the goal because the goal is already met. In this case, no further reliability growth testing is needed.
===Growth Rate for Nominal Idealized Curve===
The growth rate for the nominal idealized curve is defined in the same context as the growth rate for the Duane postulate [[RGA_References|[8]]]. The nominal idealized curve has the same functional form for the growth rate as the Duane postulate and the Crow-AMSAA (NHPP) model.
For both the Duane postulate and the Crow-AMSAA (NHPP) model, the average failure intensity is given by:
:<math>C(t)=\frac{\lambda {{t}^{\beta }}}{t}=\lambda {{t}^{(\beta -1)}}\,\!</math>
Also, for both the Duane postulate and the Crow-AMSAA (NHPP) model, the instantaneous failure intensity is given by:
:<math>\begin{align}
r(t)=\lambda \beta {{t}^{(\beta -1)}}
\end{align}\,\!</math>
Taking the difference, <math>D(t),\,\!</math> between the average failure intensity, <math>C(t)\,\!</math> and the instantaneous failure intensity, <math>r(t)\,\!</math>, yields:
:<math>\begin{align}
D(t)=\lambda {{t}^{(\beta -1)}}-\lambda \beta {{t}^{(\beta -1)}}
\end{align}\,\!</math>
Then:
:<math>\begin{align}
D(t)=\lambda {{t}^{(\beta -1)}}[1-\beta ]
\end{align}\,\!</math>
For reliability growth to occur, <math>D(t)\,\!</math> must be decreasing.
The growth rate for both the Duane postulate and the Crow-AMSAA (NHPP) model is the negative of the slope of <math>\log (D(t))\,\!</math> as a function of <math>\log (t)\,\!</math> :
:<math>\begin{align}
{{\log }_{e}}(D(t))=\text{constant}-(1-\beta ){{\log }_{e}}(t)
\end{align}\,\!</math>
The slope is negative under reliability growth and equals:
:<math>\begin{align}
\text{slope}=-(1-\beta )
\end{align}\,\!</math>
The growth rate for both the Duane postulate and the Crow-AMSAA (NHPP) model is equal to the negative of this slope:
:<math>\begin{align}
\text{Growth Rate}=(1-\beta )
\end{align}\,\!</math>
The instantaneous failure intensity for the nominal idealized curve is:
:<math>\begin{align}
{{r}_{NI}}(t)={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{(t)}^{(\beta -1)}}
\end{align}\,\!</math>
The cumulative failure intensity for the nominal idealized curve is:
:<math>\begin{align}
{{C}_{NI}}(t)={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda {{(t)}^{(\beta -1)}}
\end{align}\,\!</math>
therefore:
:<math>\begin{align}
{{D}_{NI}}(t)=[{{C}_{NI}}(t)-{{r}_{NI}}(t)]=\lambda {{t}^{(\beta -1)}}[1-\beta ]
\end{align}\,\!</math>
and:
:<math>\begin{align}
{{\log }_{e}}({{D}_{NI}}(t))=\text{constant}-(1-\beta ){{\log }_{e}}(t)
\end{align}\,\!</math>
Therefore, in accordance with the Duane postulate and the Crow-AMSAA (NHPP) model, <math>a=1-\beta \,\!</math> is the growth rate for the reliability growth plan.
===Lambda - Beta Parameter Relationship===
Under the Crow-AMSAA (NHPP) model, the time to first failure is a Weibull random variable. The MTTF of a Weibull distributed random variable with parameters <math>\beta \,\!</math> and <math>\eta \,\!</math> is:
:<math>MTTF=\eta \cdot \Gamma \left( 1+\frac{1}{\beta } \right)\,\!</math>
The parameter lambda is defined as:
:<math>\lambda =\frac{1}{{{\eta }^{\beta }}}\,\!</math>
Using the equation for lambda in the MTTF relationship, we have:
:<math>MTB{{F}_{B}}=\frac{\Gamma \left( 1+\tfrac{1}{\beta } \right)}{{{\lambda }^{\left( \tfrac{1}{\beta } \right)}}}\,\!</math>
or, in terms of failure intensity:
:<math>{{\lambda }_{B}}=\frac{{{\lambda }^{\left( \tfrac{1}{\beta } \right)}}}{\Gamma \left( 1+\tfrac{1}{\beta } \right)}\,\!</math>
==Actual Idealized Growth Curve==
The actual idealized growth curve differs from the nominal idealized curve in that it takes into account the average fix delay that might occur in each test phase. The actual idealized growth curve is continuous and goes through each of the test phase target MTBFs.
===Fix Delays and Test Phase Target MTBF===
Fix delays reflect how long it takes from the time a problem failure mode is discovered in testing, to the time the corrective action is incorporated into the system and reliability growth is realized. The consideration of the fix delay is often in terms of how much calendar time it takes to incorporate a corrective action fix after the problem is first seen. However, the impact of the delay on reliability growth is reflected in the average test time it takes between finding a problem failure mode and incorporating a corrective action. The fix delay is reflected in the actual idealized growth curve in terms of test time.
In other words, the average fix delay is calendar time converted to test hours. For example, say that we expect an average fix delay of two weeks. If in two weeks the total test time is 1,000 hours, then the average fix delay is 1,000 hours. If in the same two weeks the total test time is 2,000 hours (maybe there are more units available or more shifts), then the average fix delay is 2,000 hours.
There can be a constant fix delay across all test phases or, as a practical matter, each test phase can have a different fix delay time. In practice, the fix delay will generally be constant over a particular test phase. <math>{{L}_{i}}\,\!</math> denotes the fix delay for phase <math>i=1,...,P,\,\!</math> where <math>P\,\!</math> is the total number of phases in the test. The RGA software allows for a maximum of ten test phases.
===Actual Failure Intensity Function===
Consider a test plan consisting of <math>i\,\!</math> phases. Taking into account the fix delay within each phase, we expect the actual failure intensity to be different (i.e., shifted) from the nominal failure intensity. This is because fixes are not incorporated instantaneously; thus, growth is realized at a later time compared to the nominal case.
Specifically, the actual failure intensity will be estimated as follows:
'''Test Phase 1'''
For the first phase of a test plan, the actual idealized curve failure intensity, <math>{{r}_{AI}}(t)\,\!</math>, is :
:<math>{{r}_{AI}}(t)={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{\left[ \left( \frac{{{T}_{1}}-{{L}_{1}}}{{{T}_{1}}} \right)t \right]}^{(\beta -1)}}\text{ for }0<t\le {{T}_{1}}\,\!</math>
Note that the end time of Phase 1, <math>{{T}_{1}},\,\!</math> must be greater than <math>{{L}_{1}}+{{t}_{0}}\,\!</math>. That is, <math>{{T}_{1}}>{{L}_{1}}+{{t}_{0}}\,\!</math>.
The actual idealized curve initialization time for Phase 1, <math>T_{0}^{AIC},\,\!</math> is calculated from:
:<math>{{r}_{AI}}(T_{0}^{AIC})={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{\left[ \left( \frac{{{T}_{1}}-{{L}_{1}}}{{{T}_{1}}} \right)T_{0}^{AIC} \right]}^{(\beta -1)}}\,\!</math>
where <math>{{r}_{AI}}(T_{0}^{AIC})={{r}_{I}}.\,\!</math>
Therefore, using the equation for the [[Reliability_Growth_Planning#Initialization_Time|initialization time]], we have:
:<math>{{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{\left[ \left( \frac{{{T}_{1}}-{{L}_{1}}}{{{T}_{1}}} \right)T_{0}^{AIC} \right]}^{(\beta -1)}}={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta t_{0}^{(\beta -1)}\,\!</math>
By obtaining the initial failure intensity for <math>T_{0}^{AIC}\,\!</math>, we get:
:<math>T_{0}^{AIC}=\frac{{{t}_{0}}}{\left( \tfrac{{{T}_{1}}-{{L}_{1}}}{{{T}_{1}}} \right)}\,\!</math>
'''Test Phase <math>i\,\!</math>'''
For any test phase <math>i\,\!</math>, the actual idealized curve failure intensity is given by:
:<math>{{r}_{AI}}(t)={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{\left[ {{T}_{i-1}}-{{L}_{i-1}}+\left( \frac{{{T}_{i}}-{{L}_{i}}-{{T}_{i-1}}+{{L}_{i-1}}}{{{T}_{i}}-{{T}_{i-1}}} \right)(t-{{T}_{i-1}}) \right]}^{(\beta -1)}}\,\!</math>
where <math>{{T}_{i-1}}\le t\le {{T}_{i}}\,\!</math> and <math>{{T}_{i}}\,\!</math> is the test time of each corresponding test phase.
The actual idealized curve MTBF is:
:<math>{{M}_{AI}}=\frac{1}{{{r}_{AI}}(t)}\,\!</math>
===Actual Time to Reach Goal===<!-- THIS SECTION HEADER IS LINKED FROM ANOTHER SECTION IN THIS
PAGE. IF YOU RENAME THE SECTION, YOU MUST UPDATE THE LINK(S). -->
The actual time to reach the target MTBF or failure intensity goal, <math>{{t}_{AC,G}},\,\!</math> can be found by solving for the actual idealized curve failure intensity:
:<math>\begin{align}
{{r}_{AI}}({{t}_{AC,G}})={{\lambda }_{A}}+(1-d){{\lambda }_{B}}+d\lambda \beta {{\left[ {{T}_{i-1}}-{{L}_{i-1}}+\left( \frac{{{T}_{i}}-{{L}_{i}}-{{T}_{i-1}}+{{L}_{i-1}}}{{{T}_{i}}-{{T}_{i-1}}} \right)({{t}_{AC,G}}-{{T}_{i-1}}) \right]}^{(\beta -1)}} 
\end{align}\,\!</math>
Since the actual idealized growth curve depends on the phase durations and average fix delays, there are three different cases that need to be treated differently in order to determine the actual time to reach the MTBF goal. The cases depend on when the actual MTBF that can be reached within the specific phase durations and fix delays becomes equal to the MTBF goal. This can be determined by solving for the actual idealized curve failure intensity for phases <math>1\,\!</math> through <math>i\,\!</math>, and then solving in terms of actual idealized curve MTBF for each phase and finding the phase during which the actual MTBF becomes equal to the goal MTBF. The three cases are presented next.
'''Case 1: MTBF goal is met during the last phase'''
If <math>{{T}_{F}}\,\!</math> indicates the cumulative end phase time for the last phase, and <math>{{L}_{F}}\,\!</math> indicates the fix delay for the last phase, then we have:
:<math>\begin{align}
{{r}_{G}}= & {{\lambda }_{A}}+(1-d){{\lambda }_{B}} \\
& +d\lambda \beta {{\left[ {{T}_{F-1}}-{{L}_{F-1}}+\left( \frac{{{T}_{F}}-{{L}_{F}}-{{T}_{F-1}}+{{L}_{F-1}}}{{{T}_{F}}-{{T}_{F-1}}} \right)({{t}_{AC,G}}-{{T}_{F-1}}) \right]}^{(\beta -1)}} 
\end{align}\,\!</math>
Starting to solve for <math>{{t}_{AC,G}}\,\!</math> yields:
:<math>{{\left[ \frac{{{r}_{G}}-{{\lambda }_{A}}-(1-d){{\lambda }_{B}}}{d\lambda \beta } \right]}^{\tfrac{1}{\beta -1}}}={{T}_{F-1}}-{{L}_{F-1}}+\left( \frac{{{T}_{F}}-{{L}_{i}}-{{T}_{F-1}}+{{L}_{F-1}}}{{{T}_{F}}-{{T}_{F-1}}} \right)({{t}_{AC,G}}-{{T}_{F-1}})\,\!</math>
We can substitute the left term by solving for the [[Reliability_Growth_Planning#Nominal_Time_to_Reach_Goal|nominal time to reach the goal]]; thus, we have:
:<math>{{t}_{N,G}}={{T}_{F-1}}-{{L}_{F-1}}+\left( \frac{{{T}_{F}}-{{L}_{F}}-{{T}_{F-1}}+{{L}_{F-1}}}{{{T}_{F}}-{{T}_{F-1}}} \right)({{t}_{AC,G}}-{{T}_{i-1}})\,\!</math>
therefore:
:<math>{{t}_{AC,G}}=\frac{{{t}_{N,G}}-{{T}_{F-1}}+{{L}_{F-1}}}{\left( \tfrac{{{T}_{F}}-{{L}_{F}}-{{T}_{F-1}}+{{L}_{F-1}}}{{{T}_{F}}-{{T}_{F-1}}} \right)}+{{T}_{F-1}}\,\!</math>
'''Case 2: MTBF goal is met before the last phase'''
The equation for <math>{{t}_{AC,G}}\,\!</math> that was derived for case 1 still applies, but in this case <math>{{T}_{F}}\,\!</math> and <math>{{L}_{F}}\,\!</math> are the time and fix delay of the phase during which the goal is met.
'''Case 3: MTBF goal is met after the final phase'''
If the goal MTBF, <math>{{M}_{G}},\,\!</math> is met after the final test phase, then the actual time to reach the goal is not calculated since additional phases have to be added with specific duration and fix delays. The reliability growth program needs to be re-evaluated with the following options:
*Add more phase(s) to the program.
*Re-examine the phase duration of the existing phases.
*Investigate whether there are potential process improvements in the program that can reduce the average fix delay for the phases.
Other alternative routes for consideration would be to investigate the rest of the inputs in the model:
*Change the management strategy.
*Consider if further program risk can be acceptable, and if so, reduce the growth potential design margin.
*Consider if it is feasible to increase the effectiveness factors of the delayed fixes by using more robust engineering redesign methods.
Note that each change of input variables into the model can significantly influence the results.
With that in mind, any alteration in the input parameters should be justified by actionable decisions that will influence the reliability growth program. For example, increasing the average effectiveness factor value should be done only when there is proof that the program will pursue a different, more effective path in terms of addressing fixes.
==Reliability Growth Planning Examples==
===Growth Plan for 3 Phases===
{{:Growth_Plan_for_Three_Phases}}
===Growth Plan for 7 Phases===
{{:Growth_Plan_for_Seven_Phases}}
===Growth Plan for 4 Phases===
{{:Growth_Plan_for_Four_Phases}}

Latest revision as of 21:24, 31 January 2017

New format available! This reference is now available in a new format that offers faster page load, improved display for calculations and images, more targeted search and the latest content available as a PDF. As of September 2023, this Reliawiki page will not continue to be updated. Please update all links and bookmarks to the latest reference at help.reliasoft.com/reference/reliability_growth_and_repairable_system_analysis

Chapter 4: Reliability Growth Planning


RGAbox.png

Chapter 4  
Reliability Growth Planning  

Synthesis-icon.png

Available Software:
RGA

Examples icon.png

More Resources:
RGA examples


Introduction

In developmental reliability growth testing, the objective is to test a system, find problem failure modes, incorporate corrective actions and therefore increase the reliability of the system. This process is continued for the duration of the test time. If the corrective actions are effective then the system mean time between failures (MTBF) or mean trials between failures (MTrBF) will move from an initial low value to a higher value. Typically, the objective of reliability growth testing is not to just increase the MTBF/MTrBF, but to increase it to a particular value called the goal or requirement. Therefore, determining how much test time is needed for a particular system is generally of particular interest in reliability growth testing.

The Duane postulate is based on empirical observations, and it reflects a learning curve pattern for reliability growth. This learning curve pattern forms the basis of the Crow-AMSAA (NHPP) model. The Duane postulate is also reflected in the Crow Extended model in the form of the discovery function [math]\displaystyle{ h(t)\,\! }[/math].

The discovery function is the rate in which new, distinct problems are being discovered during reliability growth development testing. The Crow-AMSAA (NHPP) model is a special case of the discovery function. Consider that when a new and distinct failure mode is first seen, the testing is stopped and a corrective action is incorporated before the testing is resumed. In addition, suppose that the corrective action is highly effective that the failure mode is unlikely to be seen again. In this case, the only failures observed during the reliability growth test are the first occurrences of the failure modes. Therefore, if the Crow-AMSAA (NHPP) model and the Duane postulate are accepted as the pattern for a test-fix-test reliability growth testing program, then the form of the Crow-AMSAA (NHPP) model must be the form for the discovery function, [math]\displaystyle{ h(t)\,\! }[/math].

To be consistent with the Duane postulate and the Crow-AMSAA (NHPP) model, the discovery function must be of the same form. This form of the discovery function is an important property of the Crow extended model and its application in growth planning. As with the Crow-AMSAA (NHPP) model, this form of the discovery function ties the model directly to real-world data and experiences.

Growth Planning Models

There are two types of reliability growth planning models available in RGA:

Growth Planning Inputs

The following parameters are used in both the continuous and discrete reliability growth models.

Management Strategy Ratio & Initial Failure Intensity

When a system is tested and failure modes are observed, management can make one of two possible decisions, either to fix or to not fix the failure mode. Therefore, the management strategy places failure modes into two categories: A modes and B modes. The A modes are all failure modes such that, when seen during the test, no corrective action will be taken. This accounts for all modes for which management determines to be not economical or otherwise justified to take a corrective action. The B modes are either corrected during the test or the corrective action is delayed to a later time. The management strategy is defined by what portion of the failures will be fixed.

Let [math]\displaystyle{ {{\lambda }_{I}}\,\! }[/math] be the initial failure intensity of the system in test. [math]\displaystyle{ {{\lambda }_{A}}\,\! }[/math] is defined as the A mode's initial failure intensity and [math]\displaystyle{ {{\lambda }_{B}}\,\! }[/math] is defined as the B mode's initial failure intensity. [math]\displaystyle{ {{\lambda }_{A}}\,\! }[/math] is the failure intensity of the system that will not be addressed by corrective actions even if a failure mode is seen during testing. [math]\displaystyle{ {{\lambda }_{B}}\,\! }[/math] is the failure intensity of the system that will be addressed by corrective actions if a failure mode is seen during testing.

Then, the initial failure intensity of the system is:

[math]\displaystyle{ \begin{align} {{\lambda }_{I}}={{\lambda }_{A}}+{{\lambda }_{B}} \end{align}\,\! }[/math]

The initial system MTBF is:

[math]\displaystyle{ {{M}_{I}}=\frac{1}{{{\lambda }_{I}}}\,\! }[/math]

Based on the initial failure intensity definitions, the management strategy ratio is defined as:

[math]\displaystyle{ msr=\frac{{{\lambda }_{B}}}{{{\lambda }_{A}}+{{\lambda }_{B}}}\,\! }[/math]

The [math]\displaystyle{ msr\,\! }[/math] is the portion of the initial system failure intensity that will be addressed by corrective actions, if seen during the test.

The failure mode intensities of the type A and type B modes are:

[math]\displaystyle{ \begin{align} {{\lambda }_{A}}= & \left( 1-msr \right)\cdot {{\lambda }_{I}} \\ {{\lambda }_{B}}= & msr\cdot {{\lambda }_{I}} \end{align}\,\! }[/math]

Effectiveness Factor

When a delayed corrective action is implemented for a type B failure mode, in other words a BD mode, the failure intensity for that mode is reduced if the corrective action is effective. Once a BD mode is discovered, it is rarely totally eliminated by a corrective action. After a BD mode has been found and fixed, a certain percentage of the failure intensity will be removed, but a certain percentage of the failure intensity will generally remain. The fraction decrease in the BD mode failure intensity due to corrective actions, [math]\displaystyle{ d\,\! }[/math], [math]\displaystyle{ \left( 0\lt d\lt 1 \right),\,\! }[/math] is called the effectiveness factor (EF).

A study on EFs showed that an average EF, [math]\displaystyle{ d\,\! }[/math], is about 70%. Therefore, about 30% (i.e., [math]\displaystyle{ 100(1-d)%\,\! }[/math]) of the BD mode failure intensity will typically remain in the system after all of the corrective actions have been implemented. However, individual EFs for the failure modes may be larger or smaller than the average. This average value of 70% can be used for planning purposes, or if such information is recorded, an average effectiveness factor from a previous reliability growth program can be used.

MTBF Goal

When putting together a reliability growth plan, a goal MTBF/MTrBF [math]\displaystyle{ {{M}_{G}}\,\! }[/math] (or goal failure intensity [math]\displaystyle{ {{\lambda }_{G}}\,\! }[/math] ) is defined as the requirement or target for the product at the end of the growth program.

Growth Potential

The failure intensity remaining in the system at the end of the test will depend on the management strategy given by the classification of the type A and type B failure modes. The engineering effort applied to the corrective actions determines the effectiveness factors. In addition, the failure intensity depends on [math]\displaystyle{ h(t)\,\! }[/math], which is the rate at which problem failure modes are being discovered during testing. The rate of discovery drives the opportunity to take corrective actions based on the seen failure modes, and it is an important factor in the overall reliability growth rate. The reliability growth potential is the limiting value of the failure intensity as time [math]\displaystyle{ T\,\! }[/math] increases. This limit is the maximum MTBF that can be attained with the current management strategy. The maximum MTBF/MTrBF will be attained when all type B modes have been observed and fixed.

If all the discovered type B modes are corrected by time [math]\displaystyle{ T\,\! }[/math], that is, no deferred corrective actions at time [math]\displaystyle{ T\,\! }[/math], then the growth potential is the maximum attainable with the type B designation of the failure modes and the corresponding assigned effectiveness factors. This is called the nominal growth potential. In other words, the nominal growth potential is the maximum attainable growth potential assuming corrective actions are implemented for every mode that is planned to be fixed. In reality, some corrective actions might be implemented at a later time due to schedule, budget, engineering, etc.

If some of the discovered type B modes are not corrected at the end of the current test phase, then the prevailing growth potential is below the maximum attainable with the type B designation of the failure modes and the corresponding assigned effectiveness factors.

If all type B failure modes are discovered and corrected with an average effectiveness factor, [math]\displaystyle{ d\,\! }[/math], then the maximum reduction in the initial system failure intensity is the growth potential failure intensity:

[math]\displaystyle{ {{\lambda }_{GP}}={{\lambda }_{A}}+\left( 1-d \right){{\lambda }_{B}}\,\! }[/math]

The growth potential MTBF/MTrBF is:

[math]\displaystyle{ {{M}_{GP}}=\frac{1}{{{\lambda }_{GP}}}\,\! }[/math]

Note that based on the equations for the initial failure intensity and the management strategy ratio (given in the Management Strategy and Initial Failure Intensity section), the initial failure intensity is equal to:

[math]\displaystyle{ {{\lambda }_{I}}=\frac{{{\lambda }_{GP}}}{1-d\cdot msr}\,\! }[/math]

Growth Potential Design Margin

The Growth Potential Design Margin ( [math]\displaystyle{ GPDM\,\! }[/math] ) can be considered as a safety margin when setting target MTBF/MTrBF values for the reliability growth plan. It is common for systems to degrade in terms of reliability when a prototype product is going into full manufacturing. This is due to variations in materials, processes, etc. Furthermore, the in-house reliability growth testing usually overestimates the actual product reliability because the field usage conditions may not be perfectly simulated during testing. Typical values for the [math]\displaystyle{ GPDM\,\! }[/math] are around 1.2. Higher values yield less risk for the program, but require a more rigorous reliability growth test plan. Lower values imply higher program risk, with less safety margin.

During the planning stage, the growth potential MTBF/MTrBF, [math]\displaystyle{ {{M}_{GP}},\,\! }[/math] can be calculated based on the goal MTBF, [math]\displaystyle{ {{M}_{G}},\,\! }[/math] and the growth potential design margin, [math]\displaystyle{ GPDM\,\! }[/math].

[math]\displaystyle{ {{M}_{GP}}=GPDM\cdot {{M}_{G}}\,\! }[/math]

or in terms of failure intensity:

[math]\displaystyle{ {{\lambda }_{GP}}=\frac{{{\lambda }_{G}}}{GPDM}\,\! }[/math]