bd7349
Member
Speaking of the vector space bird's eye view, here is a good article that describes how it might work:
Tesla Bird's Eye View Explained, Tesla To Offer Bird's Eye View In FSD Package - VehicleSuggest
Have you seen this? 3D Packing for Self-Supervised Monocular Depth Estimation [CVPR 2020]
Likely that Tesla is going for a high fidelity 3D vector-space representation of the world similar to that. It seems like the only solution possible for a birds eye view that’s useful for parking since it would need to accurately show what the car sees or the feature would be useless. The current vector-space representation shown on the screen is pretty basic and shows none of that at the moment.
It’s worth noting that the high fidelity 3D representation was done with the training of video data (explained here in their CVPR video, different from the one above) which we know is what Tesla aims to do/is doing. Also noteworthy is that they did this with the data from only one camera. Tesla has 8 cameras to utilize depth data from, so it wouldn’t surprise me if their 3D model would be even more detailed and accurate than what is seen in the 3D packing video (which again, it would have to be if it’s to be useful for parking and other tight maneuvers).
The Autopilot visualizations today already are vector space rendering of selected neural network outputs, and this allows the driver to move the virtual camera around with the touch screen. I believe green found Godot Engine for MCU2 that could be used to render these visualizations.
The birds-eye viewpoint already happens when the vehicle is placed in park (i.e., virtual camera is moved to above the vehicle instead of behind), so I would guess the rewrite having the neural network process all cameras allows for more consistent outputs of parking lines, curbs, parking barriers, adjacent vehicles, etc.
There's a few advantages of vector space as opposed to traditional real-time camera feeds in a birds eye view perspective:
- avoids camera inconsistency, e.g., color, calibration, distortions
- dynamic camera positions, e.g., zooming to wheels close to curbs
- permanence of static objects, e.g., estimating obscured barrier distance
You’re correct, although the current birds eye view vector-space representation is pretty basic and non-detailed. What I’ve linked above is exactly what you’re describing in a highly detailed 3D vector-space representation of the world (no camera inconsistencies, dynamic camera positioning, object permanence).
Last edited: