Q&A: Why did my car break? It was working the other day just fine!
- Tyler Betthauser
- Jun 2
- 9 min read
Often, customers will say, in some shape or form, that "the car just stopped working," and "it was fine just the other day". I can remember thinking to myself something similar previously with other cars I've owned. The thing is that while failures appear to be nearly random, there is a valid statistical explanation for why a vehicle exhibits an issue. There is a slice of engineering called Reliability Engineering that tests components to their limits in an effort to define how often they break, when they break, and the impacts on the vehicle. This doesn't indicate planned obsolescence. It just means that for a given set up assumptions and test criteria, a component is expected to fail in a given probability distribution over time. And, we hope as consumers, the tests reflect the typical usage type and environment. In some cases, the reliability curve can estimate well or poorly based on real-world usage.
In this article, we will talk about the statistics of failure and how you can conceptualize a rationale when things go wrong with a car.
The Model
First, we qualify what is being used to define our reliability curve model. The point of our model is not precision, but rather to demonstrate a point about how we can rationalize why vehicle failures appear random. We construct a standard bathtub curve by calculating a total hazard rate over time. It achieves this by summing three independent mathematical functions representing different phases of a component lifecycle.
The infant mortality phase is modeled as an exponentially decaying function. It assumes an initial high defect rate that rapidly diminishes as flawed manufacturing units fail early. The mathematics utilize a base rate multiplied by an exponential decay factor relative to time, expressed as:
The random failures phase is modeled as a static constant. This assumes a baseline risk of failure that is entirely independent of the vehicle age or operating conditions. In this model, it is set to a constant value of 0.05.
The wear out phase utilizes an exponential growth function modified by user defined environmental covariates. The model calculates a stress factor by assessing the deviation of the climate from a temperate baseline, scaling it by usage intensity, and dividing it by the maintenance routine.
This stress factor is then inserted into the exponent of the wear out equation. This means poor maintenance or extreme climates do not just linearly increase wear but accelerate it exponentially over time.
To translate the combined continuous hazard rate into a daily probability of failure, the model utilizes the cumulative distribution function of the exponential distribution. It divides the annualized total hazard rate by the number of days in a year and calculates the probability of at least one failure occurring in that single day interval.
This model demonstrates the conceptual relationship between environmental stressors and system reliability. Additive hazard models are a standard educational tool for visualizing the bathtub curve. The exponential scaling of wear based on operational stress aligns conceptually with established physical degradation principles. For most, the basics do a reasonable job getting the point across.
There are limitations of using arbitrary constants rather than empirically derived parameters. The variables used for decay rates and wear acceleration in this specific model are basically theoretical. In physical reliability engineering, calculating wear out requires specific operational data fitted to a Weibull distribution rather than a generalized exponential growth function. Additionally, the formula assumes that usage and maintenance interact in a purely linear and proportional manner to define the stress factor, which heavily simplifies the complex physical realities of mechanical shear or thermal degradation.
The Reliability Curve
For the uninitiated, a reliability is also called "the bathtub curve". It is called the bathtub curve because it describes the shape of the reliability curve in general. There is an initial rise, a steady decline, and another more exponential rise. The Y-axis is the probability of failure or failure rate and the X-axis is a time variable. Time can be measured in cycles, years, days, hours, or whatever temporal factor that applies.

Initially, manufacturing defects tend to show up early whereas latent failures end up being related to aging or some other condition related to usage. For example, adherence to maintenance schedules, climate, and usage type. Reliability curves also can measure different kinds of things. In this case the vehicle is treated the thing being evaluated for reliability. But, the reality is that a vehicle is made up of thousands of components that have their own estimated reliability curve. Therefore, the reliability curve of the vehicle ends up being a combination of all the reliability estimations. So, the reliability curve is a combination of all the probabilities at any given time. It might be intuitive to think that each kind of failure is a discreet, one time, event that is independent from everything else. This is simply not the case. Even some obscure factor that goes completely unnoticed could move the reliability curve.
In most OEMs, a reliability curve is measured after a launch of a product. This is incorrect. A better approach would be to develop reliability data before a launch that is based on either simulations or many thousands of test cycles. The results can then be compared to telematics data after launch to confirm whether or not reliability matches the hypothesis made with the original curves. Software can be tested on production intent hardware using benches that cycle through use cases like ignition on, off, audio, and resource utilization. For the expected number of vehicles with an assumed number of average ignition cycles per day, software systems can establish a time to failure across many parts and vehicles. Hardware itself can be tested in similar ways, but in real vehicles and environmental chambers. Suppliers have environmental chambers that simulate temperature swings, vibration, and other factors--all which can see impacts over time to functionality.
Reliability curves can change over time as well--which will change the probabilities for a failure over time. In our mock reliability curve we demonstrate how simple behaviors can make your seemingly reliable car fail unexpectedly. Even if you are not intending to influence the failure. The manufacturer is not always at fault, though there are some objective measures which seem to indicate quality is truly suffering as discussed here: Ford's ITM Recall is a Troubling Sign for the Future of Software Defined Vehicles. When a driver tends to push the maintenance to its limits (missing oil changes by a few thousand miles often, not flushing the cooling system, or not changing brake fluid) then the reliability curve tends to make issues show up sooner in the ownership cycle. In our model we have a slider that ranges from 0-100%. '0' indicates that the driver does not ever do proper maintenance, '100' is a highly proactive maintainer, and '50' meets the typical OEM standards. A setting of even '30' shows that after even 5 years of ownership, the probability of failure starts to exponentially increase. A key problem for those buying leased vehicles with poor maintenance history

Conversely, even a moderate increase in proactive maintenance on a vehicle can result in a massive risk reduction later in the vehicles life. There are obviously lots of assumptions being made in this model, but it is meant to prove a point not be precise for each model and brand. The point, is that the next time you consider not getting necessary maintenance then the seemingly random failure after years is more or less probable.

Probabilities change again when we compound other probabilities for issues relative to climate (very hot, very cold, and temperate) or usage in towing or off-roading. Even moderate increases in the frequency of heavier use and hotter days throughout the year will increase risk earlier in the lifecycle for problems--even with increased maintenance.

Every time you start the Car, a Dice is Rolled
Whenever a vehicle is cycled on, driven, turned off, and even serviced a probability exists for a failure. Luckily, most failures are remote, but they do happen. Somewhat unfortunately, many of those remote instances are on social media--where bad news congregates all of those stories.

These probabilities are not specific to any model or make. But, they do demonstrate that in reliability engineering, there is no such thing as a faultless system. Sometimes, there are cars and customers who are just unlucky.
A Vehicle is only as Reliable as its least Reliable Parts
Mainstream reliability engineering evaluates a vehicle as a combination of individual component reliabilities. In a pure series system, the overall reliability is the product of the reliability of each individual component (More, Shubham). Mathematically, this is expressed as:
If a critical system has 50 interdependent components, and each component has a 99 percent probability of lasting ten years, the overall system reliability is not 99 percent. It is 0.99 multiplied by itself 50 times, which yields a system reliability of roughly 60.5 percent over that period. This multiplicative effect explains why a vehicle can experience failures despite being built with highly reliable individual parts. Every component added to a series system inherently lowers the total reliability of that system (Gajjal, Priya,. n.d).
When accounting for environmental factors like vibration, thermal extremes, and shearing forces, the fundamental series reliability formula is updated by recalculating the baseline probability of failure for each individual component before computing the system total. Mainstream reliability engineering achieves this using covariate models, most notably the Proportional Hazards Model. Instead of assuming a static failure rate, this model assumes that environmental stressors act as multipliers on a baseline hazard rate.
Mathematically, the system reliability formula updates to include a vector of environmental covariates Z and their respective weighting coefficients β:
In this updated formula, λ 0i(τ) represents the baseline failure rate of component i over time t, while the exponential function exp(β iZ i) adjusts that failure rate based on the specific environmental stressors acting upon it. Different physical forces utilize specific mathematical models to calculate their impact. The Arrhenius equation is widely used to model the exponential increase in failure rates due to thermal stress, whereas the Inverse Power Law is frequently applied to model mechanical wear from vibration and shearing forces. Incorporating these variables directly into the time-to-failure probability distribution ensures the resulting system reliability calculation accurately reflects the operational environment of the hardware (Mazzuchi et al., 2008).
Many would questions the practical validity of mathematically compounding multiple environmental factors into a single predictive model. Accurately modeling accelerated life testing requires establishing highly precise regression relationships between the parameters of a distribution and the accelerating variables (Escobar & Meeker, 2006). Finding the correct mathematical weights for temperature, vibration, and age combined requires massive amounts of historical failure data across every conceivable environmental permutation. Because accelerated testing forces components to fail at higher than usual stresses, the data relies on heavy extrapolation to predict normal use cases, meaning a slight model mis-specification or incorrect initial parameter can produce entirely nonsensical reliability predictions (Escobar & Meeker, 2006). Furthermore, combining multiple extreme covariates mathematically often results in theoretical environmental scenarios that rarely manifest in real world usage. While there are large computational challenges, the expansion of cloud compute capacity should make the development of these types of models much easier.
An emerging perspective in research has shifted away from static statistical formulas and toward telemetry. Rather than calculating a theoretical reliability curve based on historical baseline assumptions of environmental covariates, researchers are utilizing improved proportional covariate models combined with real-time sensor data (Chen et al., 2022). By continuously feeding live vibration, torque, and thermal telemetry from equipment sensors into the covariate function, the system hazard rate is constantly updated. This approach transforms reliability engineering from a predictive design exercise into a live assessment, allowing the reliability curve of a system to adjust instantaneously as the physical state of the machine degrades in the real world (Chen et al., 2022). However, data protection laws are severely limiting telemetry transmission. Development fleets are continuing to get smaller as their costs have begun to be allocated to simulation testing instead of vehicle testing.
Ultimately, the next time a vehicle seemingly breaks down without warning, you can recognize that a complex mathematical reality is operating beneath the surface. Failures are rarely random events. They are the culmination of manufacturing baselines, environmental stressors, and compounding probabilities across thousands of interdependent components. While a driver cannot control the baseline engineering of their vehicle, they possess direct control over the environmental and maintenance variables that exponentially accelerate the wear out phase.
This underlying statistical reality also highlights the growing bifurcation in the automotive industry between parts swappers and true technicians. Approaching a repair by simply guessing at a symptom ignores the conditional probabilities and environmental factors that caused the failure in the first place. Whether dealing with a traditional mechanical breakdown or a cascading network failure in a modern software defined vehicle, understanding these reliability curves is what allows a diagnostic process to accurately resolve the root cause. As vehicle architectures become increasingly centralized and complex, prioritizing proactive maintenance remains the most mathematically sound strategy to keep a car operating on the favorable end of the probability distribution.
Sources
System Reliability, Availability, and Maintainability, SEBoK (Systems Engineering Body of Knowledge). Discusses system reliability definitions, availability, and the mathematical differences between series and parallel reliability models.
SEBoK
System Reliability, Series, Parallel, Both, Dr. Priya Gajjal. Details the foundational formulas for series system reliability and the multiplicative effect of component failure rates.
Chen, B., Chen, Z., Chen, F., Xiao, W., Xiao, N., Fu, W., & Li, G. (2022). Reliability Assessment Method Based on Condition Information by Using Improved Proportional Covariate Model. Machines, 10(5), 337. https://doi.org/10.3390/machines10050337 Cited by: 3
Escobar, L. A., & Meeker, W. Q. (2006). A Review of Accelerated Test Models. Statistical Science, 21. https://doi.org/10.1214/088342306000000321 Cited by: 1051
Mazzuchi, T. A., Linzey, W. G., & Bruning, A. (2008). A paired comparison experiment for gathering expert judgment for an aircraft wiring risk assessment. Reliability Engineering & System Safety, 93, 722–731. https://doi.org/10.1016/j.ress.2007.03.011 Cited by: 69



Comments