I wonder what ‘raw photon count’ actually means.The down res might not work because they are feeding the neural net the raw photon count, not a processed image.
To my knowledge, there are no (mainstream) image sensors literally doing photon counts.
My guestimate is that it means that the training happens on the output of the sensor, i.e. on the raw sensor data, before de-mosaicing (see Bayer filter - Wikipedia for an illustration). While typically image recognition neural networks are trained on a demosaiced RGB image (i.e. on tensors of dimension (w,h,3), where the 3 dimension consists of the demosaiced RGB data), a performance optimised neural network could be directly trained on the raw data, i.e on tensors of dimension (w,h,4), where the 4 dimension consists of the raw RGRB data (see sensor image in the bayer filter link), skipping the demosaicing algorithm (It’s only needed to get pretty pictures for human consumption).
Since it’s video, it also eliminates the video compression and decompression (image recognition works on the uncompressed frame buffer).
The video compression is only needed to have lower bandwidth requirements, but thats not relevant if you have a fixed wired network with enough bandwidth, like in the wiring of a car.
By eliminating the video compression and decompression, you also eliminate the latency involved in video compression (you don’t need to wait until you have a couple of subsequent frames to do the temporal compression).
So yes it makes sense to train on the raw sensor output. That doesn’t mean it’s impossible to scale the raw sensor output to a lower resolution for compatibility with HW3.