Tesla is identifying objects with their neural networks with rules *right now*.
Yes, that's the existing Karpathy solution on the perception side. It is trained against a large database of labelled video with human derived interpretation of 'objects', and then a rules & optimizer based policy planner is run with those features to drive the car.
That gives what we see today.
The end to end training with generative video is quite an entirely different architecture.
There’s no reason they can’t be running any of those networks in parallel with an e2e network.
Computational budget and power. They were already at the limits and gave up on the double-redundant so that each processor now does independent computation.
And even if you did---what would you do with the results of the two systems? Like how do you 'merge' one planner with another?
I’m curious how you think you know what the computational budget is, and how close Tesla is to the limit of that budget?
They are computationally limited now even after they do heavy sparsification and quantization.
How many clock cycles were freed up by eliminating all that C++ code?
The policy side likely consumed much less computation than anything neural.
How many other networks do not need to run *in series* in an end-to-end solution? Do you have a working e2e network yourself? If not you’re just making stuff up.
E2E is far away from a production form, it's an entirely new approach and major architecture.
Tesla has done things to reduce latency—one example is bypassing the signal processing of the image data coming from the camera sensors—but that is to reduce the overall delay from photons in to processing and to remove unnecessary delays that neural networks don’t need. Processing adds noise relative to the raw signal input, so there’s a benefit too.
OK that's fine, but it also shows that they are budgeting microseconds.
But you have zero evidence whatsoever that they will be at the limit of their computational budget and will be unable to have any layers running on top of the e2e network.
Whatever the 'it' is it is years away in the future. They will always try to max out the computational ability of any hardware, you can always scale each net to a size with a given budget, but bigger nets have better performance and reliability. GPT-2 sucks at writing compared to GPT-4. Rats suck at quantum mechanics compared to humans.
Maybe the system would perform beautifully with 10x the computation budget? Doing more things at once (like two simultaneous architectures) will require each one to be cut down and have lower performance.