Operational Mission Profile Testing
It is common practice for systems to be subjected to operational testing during a development program. The objective of this testing is to evaluate the performance of the system, including reliability, under conditions that represent actual use. Because of budget, resources, schedule and other considerations, these operational tests rarely match exactly the actual use conditions. Usually, stated mission profile conditions are used for operational testing. These mission profile conditions are typically general statements that guide testing on an average basis. For example, a copier might be required to print 3,000 pages by time T=10 days and 5,000 pages by time T=15 days. In addition the copier is required to scan 200 documents by time T=10 days, 250 documents by time T=15 days, etc.
Because of practical constraints, these full mission profile conditions are typically not repeated one after the other during testing. Instead, the elements that make up the mission profile conditions are tested under varying schedules with the intent that, on average, the mission profile conditions are met. In practice, reliability corrective actions are generally incorporated into the system as a result of this type of testing.
Because of a lack of structure for managing the elements that make up the mission profile, it is difficult to have an agreed upon methodology for estimating the system's reliability. Many systems fail operational testing because key assessments such as growth potential and projections cannot be made in a straightforward manner so that management can take appropriate action. The RGA software addresses this issue by incorporating a systematic mission profile methodology for operational reliability testing and reliability growth assessments.
Operational testing is an attempt to subject the system to conditions close to the actual environment that is expected under customer use. Often this is an extension of reliability growth testing where operation induced failure modes and corrective actions are of prime interest. Sometimes the stated intent is for a demonstration test where corrective actions are not the prime objective. However, it is not unusual for a system to fail the demonstration test, and the management issue is what to do next. In both cases, important and valid key parameters are needed to properly assess this situation and make cost-effective and timely decisions. This is often difficult in practice.
For example, a system may be required to:
- Conduct a specific task a fixed number times for each hour of operation (task 1).
- Move a fixed number of miles under a specific operating condition for each hour of operation (task 2).
- Move a fixed number of miles under another operating condition for each hour of operation (task 3).
During operational testing, these guidelines are met individually as averages. For example, the actual as-tested profile for task 1 may not be uniform relative to the stated mission guidelines during the testing. What is often the case is that some of the tasks (for example task 1) could be operated below the stated guidelines. This can mask a major reliability problem. In other cases during testing, tasks 1, 2 and 3 might never meet their stated averages, except perhaps at the end of the test. This becomes an issue because an important aspect of effective reliability risk management is to not wait until the end of the test to have an assessment of the reliability performance.
Because the elements of the mission profile during the testing will rarely, if ever, balance continuously to the stated averages, a common analysis method is to piece the reliability assessments together by evaluating each element of the profile separately. This is not a well-defined methodology and does not account for improvement during the testing. It is therefore not unusual for two separate organizations (e.g., the customer and the developer) to analyze the same data and obtain different MTBF numbers. In addition, this method does not address the delayed corrective actions that are to be incorporated at the end of the test nor does it estimate growth potential or interaction effects. Therefore, to reduce this risk there is a need for a rigorous methodology for reliability during operational testing that does not rely on piecewise analysis and avoids the issues noted above.
The RGA software incorporates a new methodology to manage system reliability during operational mission profile testing. This methodology draws information from particular plots of the operational test data and inserts key information into a growth model. The improved methodology does not piece the analysis together, but gives a direct MTBF mission profile estimate of the system's reliability that is directly compared to the MTBF requirement. The methodology will reflect any reliability growth improvement during the test, and will also give management a higher projected MTBF for the system mission profile reliability after delayed corrected actions are incorporated at the end of the test. In addition, the methodology also gives an estimate of the system's growth potential, and provides management metrics to evaluate whether changes in the program need to be made. A key advantage is that the methodology is well-defined and all organizations will arrive at the same reliability assessment with the same data.
The methodology described here will use the Crow extended model for data analysis. In order to have valid Crow extended model assessments, it is required that the operational mission profile be conducted in a structured manner. Therefore, this testing methodology involves convergence and stopping points during the testing. A stopping point is when the testing is stopped for the expressed purpose of incorporating the type BD delayed corrective actions. There may be more than one stopping point during a particular testing phase. For simplicity, the methodology with only one stopping point will be described; however, the methodology can be extended to the case of more than one stopping point. A convergence point is a time during the test when all the operational mission profile tasks meet their expected averages or fall within an acceptable range. At least three convergence points are required for a well-balanced test. The end of the test, time [math]T\,\![/math], must be a convergence point. The test times between the convergence points do not have to be the same.
The objective of having the convergence points is to be able to apply the Crow extended model directly in such a way that the projection and other key reliability growth parameters can be estimated in a valid fashion. To do this, the grouped data methodology is applied. Note that the methodology can also be used with the Crow-AMSAA (NHPP) model for a simpler analysis without the ability to estimate projected and growth potential reliability. See the Grouped Data for the Crow-AMSAA (NHPP) model or for the Crow extended model.
Example - Mission Profile Testing
Consider the test-fix-find-test data set that was introduced in the Crow Extended model chapter and is shown again in the table below. The total test time for this test is 400 hours. Note that for this example we assume one stopping point at the end of the test for the incorporation of the delayed fixes. Also, suppose that the data set represents a military system with:
- Task 1 = firing a gun.
- Task 2 = moving under environment E1.
- Task 3 = moving under environment E2.
For every hour of operation, the operational profile states that the system operates in the E1 environment for 70% of the time and in the E2 environment for 30% of the time. In addition, for each hour of operation, the gun must be fired 10 times.
In general, it is difficult to manage an operational test so that these operational profiles are continuously met throughout the test. However, the operational mission profile methodology requires that these conditions be met on average at the convergence points. In practice, this almost always can be done with proper program and test management. The convergence points are set for the testing, often at interim assessment points. The process for controlling the convergence at these points involves monitoring a graph for each of the tasks.
The following table shows the expected and actual results for each of the operational mission profiles.
|Expected and Actual Results for Profiles 1, 2, 3|
|Profile 1(gun firings)||Profile 2(E1)||Profile 3(E2)|
The next figure shows a portion of the expected and actual results for mission profile 1, as entered in the RGA software.
A graph exists for each of the three tasks in this example. Each graph has a line with the expected average as a function of hours, and the corresponding actual value. When the actual value for a task meets the expected value then it is a convergence for that task. A convergence point occurs when all of the tasks converge at the same time. At least three convergence points are required, one of which is the stopping point [math]T\,\![/math]. In our example, the total test time is 400 hours. The convergence points were chosen to be at 100, 250, 320 and 400 hours. The next figure shows the data sheet that contains the convergence points in the RGA software.
The testing profiles are managed so that at these times the actual operational test profile equals the expected values for the three tasks or falls within an acceptable range. The next graph shows the expected and actual gun firings.
While the next two graphs show the expected and actual time in environments E1 and E2, respectively.
The objective of having the convergence points is to be able to apply the Crow extended model directly in such a way that the projection and other key reliability growth parameters can be estimated in a valid fashion. To do this, grouped data is applied using the Crow extended model. For reliability growth assessments using grouped data, only the information between time points in the testing is used. In our application, these time points are the convergence points 100, 250, 320, and 400. The next figure shows all three mission profiles plotted in the same graph, together with the convergence points.
The following table gives the grouped data input corresponding to the original data set.
|Grouped Data at Convergence Points 100, 250, 320 and 400 Hours|
|Number at Event||Time to Event||Classification||Mode||Number at Event||Time to Event||Classification||Mode|
The parameters of the Crow extended model for grouped data are then estimated, as explained in the Grouped Data section of the Crow Extended chapter. The following table shows the effectiveness factors (EFs) for the BD modes.
|Effectiveness Factors for Delayed Fixes|
Using the failure times data sheet shown next, we can analyze this data set based on a specified mission profile. This will group the failure times data into groups based on the convergence points that have already been specified when constructing the mission profile.
A new data sheet with the grouped data is created, as shown in the figure below and the calculated results based on the grouped data are as follows:
The following plot shows the instantaneous, demonstrated, projected and growth potential MTBF for the grouped data, based the mission profile grouping with intervals at the specified convergence points of the mission profile.