How Tesla’s flash storage fail may lead to expensive repair bills

0
90

For a company that prides itself on its computer expertise, Tesla’s rookie mistake on flash storage is hard to fathom. But not impossible. Here’s how they screwed up.

Early Tesla cars are running into a serious problem: the big control screens are beginning to freeze up and go black. Worse, that is keeping affected cars from charging. Short of replacing an $1800 printed circuit board, there’s no easy fix.

Storage stops the car from charging?

According to the folks at InsideEVs,  the problem begins with the massive amount of system logging enabled on the car’s Media Control Unit (MCU), a single-board computer running Linux. Soldered on the MCU is an 8GB eMMC (enhanced Multi-Media Controller) flash storage chip.

The eMMC is a small solid-state drive (SSD) with an on-board controller. The eMMC has a standardized interface that makes it easy for board designers to add storage without worrying about all the details required to keep flash working well.

The major worry with flash is that flash wears out. Write a bit enough times and it will stop accepting new data. The controller works to ensure that bits are written to in a round-robin fashion, a technique known as wear leveling.

What happened?

Someone at Tesla enabled logging on the MCU for no good reason. The constant writing of logs to the eMMC, data that is rarely needed, means the eMMC eventually wears out, and firmware that the eMMC stores is no longer readable, so the MCU fails.

Bummer.

The logging isn’t the only problem. The firmware stored on the eMMC has grown over the years from about 30MB to 1GB. The logging has less capacity available, and the firmware updates require a writable eMMC.

Thanks to wear leveling and other techniques, flash wear out isn’t a problem today unless you do something stupid. Like logging unneeded data year after year.

The fix

Third parties can replace the chip on the MCU, which is a fiddly process, but cheaper than replacing the entire board. Which is what Tesla does.

The third parties write the syslog data to a RAM disk, which won’t wear out, but is volatile. But why is Tesla writing the data at all?

The Storage Bits take

Non-volatile RAM will eventually solve this problem, but it isn’t quite ready for prime time.

I suspect the real problem here is that the software guys didn’t think about the impact of logging on the onboard storage, since that’s hardware and who cares? It wouldn’t be the first time.

Since this is clearly a manufacturer problem, one that customers have no control over, Tesla should handle all repairs under warranty. It’s the right thing to do.

Courteous comments welcome, of course.



How Tesla’s flash storage fail may lead to expensive repair bills