Are you referring to Ashok Elluswamy's CVPR keynote from June last year - Foundation Models for Autonomy? Tesla has been generating video trained on their fleet with 8 cameras for a while now. Presumably the general world model understanding how to predict how things work like vehicle physics and how drivers behave around traffic controls to accurately predict video should allow for end-to-end control.If it's truly a monolith, then I'm fairly astonished