Vehicle Data Hysteria: What you should know as a Driver with a Connected Vehicle
- Tyler Betthauser
- Jan 26
- 12 min read
In our modern, internet-connected society, nearly everyone has been impacted by or is at least aware of a major data breach. High-profile organizations like Equifax, UnitedHealth, Petco, JP Morgan Chase, and Microsoft SharePoint have all admitted to significant compromises of their data stores by malicious actors. Beyond these external raids, some organizations have sought ways to monetize the consumer data generated by their devices. Unfortunately, these stories have similar corollaries in the automotive industry. Automotive companies have faced scrutiny over the last decade due to advances in vehicle telemetry, which provides them access to sensitive data that requires protection. Notable examples include the sale of data to LexisNexis by General Motors (2024), the Texas AG v. Allstate and Arity case, and Philip Siefke v. Toyota and Progressive (Siefke v. Toyota, 2024). While there is certainly well deserved negative press and sentiment regarding the poor judgement internally at these OEMs, there is some nuance to these stories that should be considered.
Traditional and automotive media outlets often sound the alarm with bold, unsubstantiated statements regarding intent and risk. I want to offer an insider’s perspective to temper those hardline stances. Having spent nearly a decade in various roles at OEMs (including Quality, Software Engineering, and Data Engineering), there are details missed by reporters and journalists that get overlooked about how vehicle telemetry is used at these companies.
Vehicles do collect data, creating a potential vector for leaking personal information, but the surrounding fearmongering is often overstated and leads to hyperbolic generalizations. To understand why, we need to look at how this data is actually handled.
We'll talk about the following:
How data is generated in the car and the typical architecture in the vehicle
Where does the data typically get sent and stored?
Typical quality of the data
What happens with the Data?
What doesn't Happen?
How to protect data in your new or used vehicle
The Typical Connected Vehicle Data Architecture
Some Key Terms:
Before we start on this next part, it will be helpful to understand a few of the terms ahead of time:
Electronic Control Unit (ECU) or Microcontroller Unit (MCU): these are the small computers in the vehicle which crunch the numbers and control the various aspects of the vehicle
Local Interconnect Network (LIN): a small network within the broader network of the vehicle. These tend to connect smaller subsets of components together that would be otherwise too expensive to send through the vehicles main communication system
Back Office (BO): The data centers or infrastructure that stores all of the data and communicates with vehicles. This may be in the 'cloud' (Amazon Web Services or Microsoft Azure) or even on data center hardware that the OEM owns themselves
Main Bus: This is the electrical highway system of the car, it carries signals over wire/cables to different ECUs and MCUs
CAN Data frames: simply, data (bits and bytes) that is transmitted over the main electrical bus to various ECUs and MCUs
Firstly, there is a pretty simple, straightforward way data is transmitted from the vehicle to the OEM. Our representation will be of the domain architecture. The domain architecture tends to be an Electronic Control Unit (ECU) or Microcontroller Unit (MCU) ingesting sensor data over a Local Interconnect Network (LIN). The ECU or MCU interprets the data, maybe operates on it, then transmits that engineering data back onto the Main Bus. Main Bus could be anything from High-Speed CAN to Ethernet. Data is broadcasted over the Main Bus to its destination (usually another ECU/MCU). For those ECUs/MCUs which are capable of connecting over the internet, they'll use the Telematics Module to send that data from the vehicle to the "back office" (BO). Raw CAN Data frames are serialized and sent via some internet protocol to the BO where those CAN Data Frames are decoded and stored in some sort of database.

The model above shows how temperature data moves through the vehicle to the BO where an engineer could attempt to monitor this variable over time for a particular vehicle. In reality, however, there are thousands of different data elements being sent at any given time.
The approach shown above is known as the Domain Architecture (often used by General Motors, Ford, and FCA), but there are other approaches such as the Zonal Architecture (which is used by Rivian, Tesla, and a few others). Zonal Architectures have enabled OEMs to create more powerful ECUs with more memory which enable greater data storage in the vehicle and faster transmission of data to the BO. By increasing the number of data elements you can store on the vehicle, the more complete a picture can be created of the system at any given point. Because the Domain Architecture has a larger number, smaller, and simpler computers running highly specific functions it has been more difficult to capture data and transmit it efficiently to the cloud.
You could say that newer generations of vehicles are built to be more 'data centric'.
Where does Vehicle Data end up?
Once data enters storage, it exists in a variety of forms and locations. Some OEMs have migrated their software and storage to cloud providers to reduce infrastructure costs. However, many have adopted a hybrid approach where specific applications run on-premises while others utilize Azure or AWS.
In many cases, after the raw hex data is decoded and stored, many OEMs do nothing with it. It often simply sits in storage. To put the scale in perspective, a single vehicle can generate nearly 4 terabytes of data per day. That means each vehicle generates anywhere from 80-150 4K Ultra HD movies per day. While OEMs are not transmitting that entire volume daily for each Vehicle, the amount remains immense across the fleet. Even less of it gets used in key-decision making. Multi-million dollar contracts are won every few years, many make frequent bids, to help these companies leverage the data; however, the size of the data outstrips the skills and resources to make use of these data. After all, it is just raw materials for information. Skilled people with expertise in automotive customers and the engineering need to be freed up--they often aren't.
While there is a genuinely good point to make that all this data is in and of itself a risk to customers because it could be obtained through hacking, there is an even more practical one: there just is not enough resources, political will, and incentive structures in place to devote to making vehicle data a genuine risk to driver safety.
The sheer volume of data acts as its own form of protection. When dealing with petabytes of information across a fleet, the noise often drowns out the signal. For an OEM to build a truly intrusive profile on every driver, the cost of compute power and storage would often outweigh the potential revenue generated from selling that specific insight.
The Old Adage: "Garbage in, Garbage Out"
The value of any insight is strictly limited by the quality of the data being analyzed. Contrary to popular belief, telemetry is not an automatic goldmine. It functions more like crude oil that requires heavy, expensive refinement to become useful, a process that is not always successful or profitable. Data quality is paramount, and while strategies for remediation exist, they are often difficult to implement effectively.
The prevailing assumption is that thousands of data scientists, statisticians, and software engineers are waiting in Silicon Valley to develop digital twins of every car and driver to sell secrets to the highest bidder. While litigation indicates that companies have the capability for regrettable behavior, including the sale of data that risks being used improperly by third parties, these are often individual cases of misguided decision-making. If there were a truly systemic, nefarious trend, the volume of legal cases would far exceed the handful of examples currently highlighted by the media.
Your vehicle might not even be capable of transmitting data
OEMs do not have visibility into 100% of the vehicles they produce. Based on industry data and field experience, I estimate that upwards of 30% of vehicles become incapable of transmitting data as intended within the first three years of ownership. This failure rate climbs significantly as vehicles age, change owners, or undergo major part replacements.
Vehicles are not cell phones. While both use cellular modems, the similarities end there. For a vehicle to transmit data to the "back office" (BO), the Subscriber Identity Module (SIM) must be configured through a series of complex, error-prone handshakes between the car and the network. When a phone misbehaves, you can take it to the AT&T store for an immediate fix. If an automotive SIM fails to provision after it leaves the dealership, the OEM loses control. Fixing it would require an expensive technical service bulletin (TSB) and a trip back to the service bay. Each telematics module replaced by a dealer or independent shop has risk of an issue with provisioning--further degrading collection capabilities.
The Used Car Ownership Gap
Beyond the initial ownership window, data collection becomes even more difficult for the OEM. Vehicle ownership is managed at the state level, making it challenging for manufacturers to track transfers accurately. While some data is accessible, a vehicle may continue sending data without the OEM having a verified way to know who is behind the wheel.
Furthermore, data collection in the United States currently follows an opt-out model rather than the European opt-in standard. This choice can be made at the time of sale, via the infotainment screen, or through formal correspondence. However, just because an initial owner accepted the terms doesn't mean the second or third owner will. This results in the purposeful deactivation of services, leaving many used vehicles effectively dark to the OEM.
Connectivity to a network is not absolute
Just like your phone, a car's signal strength ebbs and flows. Modern vehicles are dynamic systems generating data faster than they can transmit it. While traditional CAN 2.0 buses operate at 1 Mbps, modern Automotive Ethernet can reach 10 Gbps—providing 1,000x more bandwidth for high-speed sensor data like LiDAR and cameras.
Despite these advances, vehicles are not data centers. They cannot persist or transmit massive datasets indefinitely. Packet loss during a lost connection can be extensive, and if a vehicle is parked in a garage with poor reception for extended periods, persisted data is often simply deleted to make room for new sensor readings. These large gaps in data make temporal analysis and individualized surveillance incredibly difficult to execute reliably. It was not uncommon when looking at charging and discharging sessions for EVs (which is used to inform battery algorithms which estimate capacity), large numbers of sessions had to be dropped because events were missing or occurred in unexpected order.
Sometimes, data is just dead wrong
Data is only as accurate as the humans who defined the parameters. Mistakes in software logic lead to errors all the time. It is not uncommon to see vehicles reporting telematics data dated 1999 or 2066 simply because an internal clock was configured incorrectly. These time-traveling data points further degrade the quality of the datasets that executives are supposedly using to predict consumer behavior.
What Vehicle Data is used for at these OEMs
Once data successfully makes the journey to the BO, it is typically used for more practical and beneficial purposes than just packaging for sale. A significant majority of this data is funneled into mobile and fleet applications. These ubiquitous apps allow you to remote start your vehicle, check battery charge, and schedule maintenance. For fleet managers—like plumbers, painters, or other skilled tradespeople—this telemetry provides critical details regarding the health of commercial vehicles to keep their businesses running smoothly.
A key industry-wide application is the monitoring of Diagnostic Trouble Codes (DTCs). Telematics systems allow manufacturers to monitor these "internal alerts" in real-time. Often, a DTC might trigger before a driver observes any symptoms or a light on the dash. These little breadcrumbs allow engineers to establish failure trends and deploy Over-the-Air (OTA) updates or technical service bulletins before a widespread mechanical failure occurs.
Quality Management
Vehicle data is incredibly important for managing quality. Yes, there are warranty claims, social media posts, and 3rd party survey results utilized in the assessment of products. Telemetry gives the best representation of what is happening in the system at any given snapshot in time whereas social media and warranty are much farther removed from the actual event (usually). Engineers have to make some very broad assumptions about a failure based on 2nd hand evidence from the technician and service writer.
A key, industry wide example, is the monitoring of Diagnostic Trouble Codes (DTCs). Most vehicles are reporting instances of DTCs to OEMs--they are foundational to the service notifications you get from products like OnStar provided by General Motors and Service Connect from Toyota. DTCs are faults which may or may not display a light on the dash and function as the interna alert system within the vehicle. That means, there are times when faults might be occurring but might not have obvious, or even observable, impacts to the vehicle. These little breadcrumbs are important to monitor as they can be useful in establishing failure trends before customers even begin to submit warranty claims. Over-the-air-Updates (OTAs) and regular software flashes at the dealer are occurring in part because of these important data.
Product Improvement
Vehicle data isn't just for tracking issues. Product Managers, design teams, and engineering teams use this to help improve the products themselves by updating requirements, test cases, processes, and design changes. While these can take sometimes years to implement due to how long some of the development cycles last. Sometimes changes can be made on the fly (usually within software), but changes to hard parts impact molds, supply chains, part costs, and much more.
Furnishing Data to the Government
While the media fixates on corporate sales, the United States Federal Government (NHTSA) and California (CARB) are active and often mandatory recipients of automotive data. These regulatory bodies have transitioned from physical inspections to data-driven in-use monitoring, and OEMs are frequently unwilling participants in these expensive data-collection campaigns.
The NHTSA primarily uses telematics for safety oversight through Standing General Order (SGO) 2021-01, which mandates the reporting of crashes involving Level 2 ADAS or higher (NHTSA, 2025). If a crash occurs and an advanced system was engaged within 30 seconds of the event, the OEM must submit a report:
Mandatory Reporting: Manufacturers of vehicles equipped with SAE Level 2 ADAS (Advanced Driver Assistance Systems) or Levels 3-5 ADS (Automated Driving Systems) must report crashes to NHTSA.
Telematics Role: The SGO explicitly recognizes telematics as a primary source of this data. If a crash occurs and the system was engaged at any time within 30 seconds of the incident, the OEM must submit a report.
AV STEP: In 2024 and 2025, NHTSA proposed and refined the AV STEP (ADS-equipped Vehicle Safety, Transparency, and Evaluation Program). This is a voluntary framework where participants agree to provide periodic and event-triggered reporting on the operational safety of ADS vehicles in exchange for certain regulatory flexibilities.
Similarly, CARB has transitioned to programs like REAL (Real Emissions Assessment Logging), requiring heavy-duty diesel vehicles to store and transmit NOx and CO2 data (CARB, 2024):
REAL (Real Emissions Assessment Logging): This regulation requires OBD systems to collect and store NOx and CO2 data for heavy-duty diesel vehicles (starting with the 2022 model year). While this data can be retrieved via scan tools, CARB’s newer programs favor remote retrieval.
Clean Truck Check (2025): Effective January 1, 2025, this program requires periodic emissions testing for heavy-duty trucks. It explicitly allows for continuously connected telematics solutions (such as those from Geotab) to transmit data directly to CARB, bypassing the need for manual inspections.
Purchasing OEM Datasets: CARB has historically acquired or received large-scale telematics datasets from General Motors, Ford, Toyota, and Honda.
eVMT Studies: During the Advanced Clean Cars (ACC) Midterm Review, CARB used trip-level telematics data from these OEMs to study electric vehicle miles traveled (eVMT) and charging habits of plug-in hybrids.
Inventory Modeling: CARB used telematics data from transport refrigeration units (TRUs) to update its EMFAC (EMission FACtor) inventory models, which estimate total state emissions based on real-world duty cycles.
If there is any entity building massive, clandestine datasets on American citizens, the evidence points more toward regulatory mandates than corporate greed. Manufacturers are often legally required to provide this data, sometimes making specific engineering decisions solely to satisfy these regulations.
What isn't happening at OEMs
The shadow cast by a few high-profile irresponsible actions has led to an assumption that all data collection is inherently malevolent. In reality, there is an abundance of utility that outweighs many of the drawbacks. There is no cabal of statisticians intent on spying on your every move to sell you useless products. The logistical hurdles—the cost of storage, the unreliability of the signals, and the complexity of the garbage data—make large-scale, individualized surveillance a poor business model for a car company.
How to Protect Yourself
While there is no silver bullet to stop all data generation, limiting your exposure is the best prophylactic. You can significantly reduce the amount of data harvested from your vehicle by taking a few basic steps.
Decline the Terms and Conditions
You have several avenues to opt out of the infotainment and telematics systems. This can be done in writing to the OEM, by contacting customer call centers, or by declining the terms during delivery. For vehicles running Android Automotive, you can even remove specific Google permissions. Remember that the U.S. currently follows an opt-out model, meaning the burden of initiative is on you to signal that you do not consent to data collection.
Keep good data hygiene
Treat your vehicle like a public computer. If you use a rental car or plan to sell your current vehicle, reset the infotainment system to its factory settings. Avoid allowing the car to sync your contacts or recent call lists via Bluetooth or USB unless necessary. These small breadcrumbs are often the most sensitive data points stored locally on the vehicle's hardware.
Lock your Credit at the Bureaus
If your primary concern is identity theft resulting from a corporate data breach, the most effective defense is to lock your credit reports by default. This prevents unauthorized parties from opening new accounts in your name, regardless of how they obtained your personal information.



Comments