Could background image lower probabilities for an action stack up from training and become higher probabilities when high probability objects are not present?
For example, an image with a slowing leading car (car got bigger in next frame) has a 90% correlation for why human driver (training) slowed down. Around the car, hundreds of other pixel combinations get labeled with 0-10% probability for why human slowed down.
Cruising along on an open road, theres a constant stream of 0-10% (IDK, some low threshold number) pixel combinations flashing all over the images. Could there by chance be some random repeating pixel combination that coincidentally occurred along with a 90% event (actual lead car slow down) that ends up above a trigger threshold, but is masked by the 90% event? Remove the 90% event, and some of that background noise ends up looking like a 40% event, and the AI driver slows a bit just in case? Phantom slowing, or all and sundry undesirable behavior?
Apologies for the inexpert language and understanding, just a thought I had. No idea if anything like this is really going on under the hood.