The authoritative guide to the effective design and production of reliable technology products, revised and updated
While most manufacturers have mastered the process of producing quality products, product reliability, software quality and software security has lagged behind. The revised second edition of Improving Product Reliability and Software Quality offers a comprehensive and detailed guide to implementing a hardware reliability and software quality process for technology products. The authors - noted experts in the field - provide useful tools, forms and spreadsheets for executing an effective product reliability and software quality development process and explore proven software quality and product reliability concepts.
The authors discuss why so many companies fail after attempting to implement or improve their product reliability and software quality program. They outline the critical steps for implementing a successful program. Success hinges on establishing a reliability lab, hiring the right people and implementing a reliability and software quality process that does the right things well and works well together. Designed to be accessible, the book contains a decision matrix for small, medium and large companies. Throughout the book, the authors describe the hardware reliability and software quality process as well as the tools and techniques needed for putting it in place. The concepts, ideas and material presented are appropriate for any organization. This updated second edition:
* Contains new chapters on Software tools, Software quality process and software security.
* Expands the FMEA section to include software fault trees and software FMEAs.
* Includes two new reliability tools to accelerate design maturity and reduce the risk of premature wearout.
* Contains new material on preventative maintenance, predictive maintenance and Prognostics and Health Management (PHM) to better manage repair cost and unscheduled downtime.
* Presents updated information on reliability modeling and hiring reliability and software engineers.
* Includes a comprehensive review of the reliability process from a multi-disciplinary viewpoint including new material on uprating and counterfeit components.
* Discusses aspects of competition, key quality and reliability concepts and presents the tools for implementation.
Written for engineers, managers and consultants lacking a background in product reliability and software quality theory and statistics, the updated second edition of Improving Product Reliability and Software Quality explores all phases of the product life cycle.
MARK A. LEVIN is the Reliability Manager for Product Development at Teradyne, Inc., USA. He has over 36 years of electronics experience working in manufacturing, design and research.
TED T. KALAL is a retired Reliability Manager. He has held many positions as a contract engineer and consultant where he focused on design, quality and reliability tasks.
JONATHAN RODIN is a Software Engineering Manager at Teradyne, Inc., USA. Jon has 39 years of experience developing software either as a programmer or managing software development projects.
About the Authors xix
List of Figures xxi
List of Tables xxv
Series Editor's Foreword xxvii
Series Foreword Second Edition xxix
Series Foreword First Edition xxxi
Foreword First Edition xxxiii
Preface Second Edition xxxv
Preface First Edition xxxvii
Acknowledgments xli
Glossary xliii
Part I Reliability and Software Quality - It's a Matter of Survival 1
1 The Need for a New Paradigm for Hardware Reliability and Software Quality 3
1.1 Rapidly Shifting Challenges for Hardware Reliability and Software Quality 3
1.2 Gaining Competitive Advantage 5
1.3 Competing in the Next Decade -Winners Will Compete on Reliability 5
1.4 Concurrent Engineering 6
1.5 Reducing the Number of Engineering Change Orders at Product Release 8
1.6 Time-to-Market Advantage 9
1.7 Accelerating Product Development 10
1.8 Identifying and Managing Risks 11
1.9 ICM, a Process to Mitigate Risk 11
1.10 Software Quality Overview 12
References 13
Further Reading 13
2 Barriers to Implementing Hardware Reliability and Software Quality 15
2.1 Lack of Understanding 15
2.2 Internal Barriers 16
2.3 Implementing Change and Change Agents 17
2.4 Building Credibility 19
2.5 Perceived External Barriers 20
2.6 Time to Gain Acceptance 21
2.7 External Barrier 22
2.8 Barriers to Software Process Improvement 23
3 Understanding Why Products Fail 25
3.1 Why Things Fail 25
3.2 Parts Have Improved, Everyone Can Build Quality Products 28
3.3 Hardware Reliability and Software Quality -The New Paradigm 28
3.4 Reliability vs. Quality Escapes 29
3.5 Why Software Quality Improvement Programs Are Unsuccessful 30
Further Reading 31
4 Alternative Approaches to Implementing Reliability 33
4.1 Hiring Consultants for HALT Testing 33
4.2 Outsourcing Reliability Testing 33
4.3 Using Consultants to Develop and Implement a Reliability Program 34
4.4 Hiring Reliability Engineers 34
Part II Unraveling the Mystery 37
5 The Product Life Cycle 39
5.1 Six Phases of the Product Life Cycle 39
5.2 Risk Mitigation 41
5.3 The ICM Process for a Small Company 45
5.4 Design Guidelines 46
5.5 Warranty 46
Further Reading 47
Reliability Process 47
DFM 48
6 Reliability Concepts 49
6.1 The Bathtub Curve 50
6.2 Mean Time between Failure 51
6.3 Warranty Costs 53
6.4 Availability 55
6.5 Reliability Growth 57
6.6 Reliability Demonstration Testing 59
6.7 Maintenance and Availability 62
6.8 Component Derating 69
6.9 Component Uprating 70
Reference 71
Further Reading 72
Reliability Growth 72
Reliability Demonstration 72
Prognostics and Health Management 72
7 FMEA 73
7.1 Benefits of FMEA 73
7.2 Components of FMEA 74
7.3 Preparing for the FMEA 86
7.4 Barriers to the FMEA Process 89
7.5 FMEA Ground Rules 91
7.6 Using Macros to Improve FMEA Efficiency and Effectiveness 92
7.7 Software FMEA 94
7.8 Software Fault Tree Analysis (SFTA) 97
7.9 Process FMEAs 97
7.10 FMMEA 99
8 The Reliability Toolbox 101
8.1 The HALT Process 101
8.2 Highly Accelerated Stress Screening (HASS) 121
8.3 HALT and HASS Test Chambers 127
8.4 Accelerated Reliability Growth (ARG) 128
8.5 Accelerated Early Life Test (ELT) 131
8.6 SPC Tool 132
8.7 FIFO Tool 132
References 134
Further Reading 134
FMEA 134
HALT 135
HASS 136
Quality 136
Burn-in 136
ESS 137
Up Rating 137
9 Software Quality Goals and Metrics 139
9.1 Setting Software Quality Goals 139
9.2 Software Metrics 140
9.3 Lines of Code (LOC) 142
9.4 Defect Density 142
9.5 Defect Models 144
9.6 Defect Run Chart 145
9.7 Escaped Defect Rate 147
9.8 Code Coverage 148
References 149
Further Reading 150
10 Software Quality Analysis Techniques 151
10.1 Root Cause Analysis 151
10.2 The 5 Whys 151
10.3 Cause and Effect Diagrams 152
10.4 Pareto Charts 153
10.5 Defect Prevention, Defect Detection, and Defensive Programming 154
10.6 Effort Estimation 157
Reference 158
Further Reading 158
11 Software Life Cycles 159
11.1 Waterfall 159
11.2 Agile 161
11.3 CMMI 162
11.4 How to Choose a Software Life Cycle 165
Reference 166
Further Reading 166
12 Software Procedures and Techniques 167
12.1 Gathering Requirements 167
12.2 Documenting Requirements 169
12.3 Documentation 172
12.4 Code Comments 173
12.5 Reviews and Inspections 174
12.6 Traceability 179
12.7 Defect Tracking 179
12.8 Software and Hardware Integration 180
References 182
Further Reading 182
13 Why Hardware Reliability and Software Quality Improvement Efforts Fail 183
13.1 Lack of Commitment to the Reliability Process 183
13.2 Inability to Embrace and Mitigate Technologies Risk Issues 185
13.3 Choosing the Wrong People for the Job 186
13.4 Inadequate Funding 186
13.5 Inadequate Resources 191
13.6 MIL-HDBK 217 -Why It Is Obsolete 192
13.7 Finding But Not Fixing Problems 195
13.8 Nondynamic Testing 196
13.9 Vibration Testing Too Difficult to Implement 196
13.10 The Impact of Late Hardware or Late Software Delivery 196
13.11 Supplier Reliability 196
Reference 197
Further Reading 197
14 Supplier Management 199
14.1 Purchasing Interface 199
14.2 Identifying Your Critical Suppliers 200
14.3 Develop a Thorough Supplier Audit Process 200
14.4 Develop Rapid Nonconformance Feedback 201
14.5 Develop a Materials Review Board (MRB) 202
14.6 Counterfeit Parts and Materials 202
Part III Steps to Successful Implementation 205
15 Establishing a Reliability Lab 207
15.1 Staffing for Reliability 207
15.2 The Reliability Lab 208
15.3 Facility Requirements 210
15.4 Liquid Nitrogen Requirements 210
15.5 Air Compressor Requirements 211
15.6 Selecting a Reliability Lab Location 212
15.7 Selecting a Halt Test Chamber 213
Reference 220
16 Hiring and Staffing the Right People 221
16.1 Staffing for Reliability 221
16.2 Staffing for Software Engineers 225
16.3 Choosing the Wrong People for the Job 226
17 Implementing the Reliability Process 229
17.1 Reliability Is Everyone's Job 229
17.2 Formalizing the Reliability Process 230
17.3 Implementing the Reliability Process 231
17.4 Rolling Out the Reliability Process 231
17.5 Developing a Reliability Culture 235
17.6 Setting Reliability Goals 236
17.7 Training 237
17.8 Product Life Cycle Defined 238
17.9 Proactive and Reactive Reliability Activities 241
Further Reading 244
Reliability Process 244
Part IV Reliability and Quality Process for Product Development 245
18 Product Concept Phase 247
18.1 Reliability Activities in the Product Concept Phase 247
18.2 Establish the Reliability Organization 248
18.3 Define the Reliability Process 249
18.4 Define the Product Reliability Requirements 249
18.5 Capture and Apply Lessons Learned 249
18.6 Mitigate Risk 252
19 Design Concept Phase 257
19.1 Reliability Activities in the Design Concept Phase 257
19.2 Set Reliability Requirements and Budgets 259
19.3 Define Reliability Design Guidelines 263
19.4 Revise Risk Mitigation 264
19.5 Schedule Reliability Activities and Capital Budgets 268
19.6 Decide Risk Mitigation Sign-off Day 269
19.7 Reflect on What Worked Well 271
20 Product Design Phase 273
20.1 Product Design Phase 273
20.2 Reliability Estimates 274
20.3 Implementing Risk Mitigation Plans 276
20.4 Design for Reliability Guidelines (DFR) 285
20.5 Design FMEA 289
20.6 Installing a Failure Reporting Analysis and Corrective Action System 290
20.7 HALT Planning 291
20.8 HALT Test Development 292
20.9 Risk Mitigation Meeting 295
Further Reading 296
FMEA 296
HALT 296
21 Design Validation Phase 299
21.1 Design Validation 299
21.2 Using HALT to Precipitate Failures 301
21.3 Proof of Screen (POS) 313
21.4 Highly Accelerated Stress Screen (HASS) 315
21.5 Operate FRACAS 315
21.6 Design FMEA 317
21.7 Closure of Risk Issues 317
Further Reading 318
FMEA 318
Acceleration Methods 318
ESS 318
HALT 319
22 Software Testing and Debugging 321
22.1 Unit Tests 321
22.2 Integration Tests 323
22.3 System Tests 324
22.4 Regression Tests 324
22.5 Security Tests 326
22.6 Guidelines for Creating Test Cases 327
22.7 Test Plans 328
22.8 Defect Isolation Techniques 329
22.9 Instrumentation and Logging 331
Further Reading 334
23 Applying Software Quality Procedures 335
23.1 Using Defect Model to Create Defect Run Chart 336
23.2 Using Defect Run Chart to Know When You Have Achieved the Quality Target 336
23.3 Using Root Cause Analysis on Defects to Improve Organizational Quality Delivery 338
23.4 Continuous Integration and Test 338
Further Reading 339
24 Production Phase 341
24.1 Accelerating Design Maturity 341
24.2 Reliability Growth 346
24.3 Design and Process FMEA 351
Further Reading 355
FMEA 355
Quality 356
Reliability Growth 356
Burn-In 357
HASS 357
25 End-of-Life Phase 359
25.1 Managing Obsolescence 359
25.2 Product Termination 360
25.3 Project Assessment 360
Further Reading 361
26 Field Service 363
26.1 Design for Ease of Access 363
26.2 Identify High Replacement Assemblies (FRUs) 363
26.3 Wearout Replacement 365
26.4 Preemptive Servicing 365
26.5 Servicing Tools 365
26.6 Service Loops 366
26.7 Availability or Repair Time Turnaround 367
26.8 Avoid System Failure Through Redundancy 367
26.9 Random versus Wearout Failures 367
Further Reading 368
Appendix A 369
A.1 Reliability Consultants 369
A.2 Graduate Reliability Engineering Programs and Reliability Certification Programs 372
A.3 Reliability Professional Organizations and Societies 376
A.4 Reliability Training Classes 377
A.5 Environmental Testing Services 379
A.6 HALT Test Chambers 381
A.7 Reliability Websites 382
A.8 Reliability Software 383
A.9 Reliability Seminars and Conferences 384
A.10 Reliability Journals 386
Appendix B 387
B.1 MTBF, FIT, and PPM Conversions 387
B.2 Mean Time Between Failure (MTBF) 387
B.3 Estimating Field Failures 396
B.3.1 Comparing Repairable to Nonrepairable Systems 397
Index 399