If Robotaxi is to be completed on anything remotely approaching Elon's timeframe, the team had better be furiously working right now on ALL aspects of the problem: poor visibility as well as driving correctness. In fact, the former is arguably more urgent, because it directly informs the sensor suite requirements for Robotaxi, and that decision must be made at least a year or two before the control aspect of the software would need to be market-ready. (Even if it turns out that pure vision with 8 cameras is sufficient, they will need to prove this by example.) The last thing Tesla can afford to do is to put Robotaxi into production with a sensor suite that isn't up to the task. That's also why all other autonomy-focused manufacturers are overbuilding their sensor suites, rather than underbuilding them. Software is much more easily retrofittable than hardware.
Fortunately, with the E2E approach, solving for weather (or at least understanding the sensor suite's limitations) amounts to gathering more training data. Synthetic data may not work; I'm not sure photorealistic water-blurred camera feeds could be synthesized accurately enough to model how it "really looks", and there are endless ways for real-world image quality to be compromised. But if it turns out that e.g. A-pillar cameras are essential for L4 superhuman driving ability, let alone lidar/radar, Tesla will first need to build these sensors into a large fleet (e.g. HW5), then gather real-world training data from such equipped cars, in order to solve poor weather issues for similarly-equipped Robotaxis. That's why I think Robotaxi is still several iterations and several years away.
FWIW, We handle poor visibility by sitting far back from the windshield and moving our heads, to minimize the impact of any particular raindrop or dirt splotch. Fixed cameras pressed up against the glass don't have either of those advantages. That's a big part of why I think a different approach may ultimately be needed to match skilled human driving adaptability, let alone exceed it.
I think I’m going to disagree with you, here. Image recognition with impairments is more like detecting a signal in the presence of random noise than it is, I dunno, calculating the precise impairment of droplets of water, then subtracting that impairment out of the image to get the underlying image.
As it happens, I’m very aware that, inherently, NNs are a tool that that extracts underlying image information from a noisy image
exceedingly well. Kind of like how double Markov chaining of noisy auditory data allows for speaker-independent voice recognition; the differences between different speakers (different vocal tracts, size, age, pitch) are all modeled as noise, and out pops the diphthongs!
Famously, NNs trained on finding giraffes by showing the NNs pictures of giraffes right side up, backwards, forewords, facing left, right, and upside down are known for finding giraffes 90% obscured by brush, trees, high grass, and what-all. Without specific training on the obscuring vegetation. The right tool for the right job; this is what NNs are good at.
I strongly suspect that whatever Tesla’s vision recognition stuff that’s in there now probably didn’t need much work to get to its current level. As I said, rather than continuing to work on that, the engineers probably bailed on that work to work on the more critical stuff, like not killing cyclists and handling unprotected left turns. Once the driving in clear weather is under control, my guess is that circling back to improve obscured, noisy vision is what they’ll do.
Asking for everything ready Right This Second is a stance, I suppose.