End-to-end might "just" need more training of exceptions and then exceptions to those exceptions and so on. This reminds me of neural network chess engines learning that it's good to capture enemy pieces except when it would leave your pieces open to attack except when it's okay to lose those pieces because you can checkmate.Somehow, it has training saying its OK to go around a stopped car, but not enough training to include all the exceptions to the training
Hopefully the process of detecting disengagements to further train 12.x will provide enough ongoing signal to learn the correct behavior. Although this might result in some cycles of features and regressions where the neural network might initially do the right behavior for the wrong reason then learns not to do the behavior for a similar but different situation then needs to learn why the original behavior was actually correct but now for the right reason.