ZeApelido
Active Member
Doesn't back propagation have the effect of turning any involved layers into a monolith? I'm seeing this as a case of a 'perception' network being nothing more than a starting point for the overall system's training. That is, after its initial training, nothing is going to require that the perception network deliver stop signs and lane lines. Those signals will be tweaked and tuned by the back propagation until the ultimate output - control - is optimized.
Tesla could
1) Do everything from scratch in a totally new architecture
2) Do an end-to-end architecture that incorporates some of the previous module architectures, but retrain the weights from scratch
3) Do an end-to-end that incorporates some of the previous module architectures, and initializes training using their initial weights.
My guess is they are doing #3 at least wrt to the perception stack. They will lop the core layers of that thing into their V12 architecture. They will probably cut of a few of the last layers (which are usually used for the final classification / regression outputs) so they *won't* have explicit output or backprop on things like object detection, segmentation, kinematics, etc... but they will start with the same weights for the layers that do the heavy lifting of perception.
When they train, they can choose to allow those weights to still be updated, or they can freeze them if they so choose. I have no idea what they are doing now, but ultimately I would think you would want to unfreeze them to allow the model to find more robust modeling of all the nuances found in real data.
So working up to V12 via V9, V10, and V11 is not a lost cause. Some of those models can be incorporated into V12 and in weight-initialization, which can dramatically speed up training time. And if not that, they are still useful for training debugging outputs like visualizations.