and then the audio cut out for a good chunk of time...
He answered some more questions afterwards maybe because the audio was cutting out.
Q: Do you think the long tail can be "solved", or do you (Tesla) more just aim to be e.g. 100x better than humans? How might you otherwise "prove" safety?
AK: There's no "solved", it's a never-ending march of 9s. I'd be very happy to reach 100x better than humans, that would be huge impact on saving lives and the economy broadly. You can't "prove" safety, but you can produce a LOT of statistical data. Which we can uniquely do, I believe. I think it has to be a lot of data and telemetry. Without feature: %rate of something. With feature: %rate of something. And this is over 1B miles. It has to look something like that.
Q: When and how will you (or Elon) tell Tesla customers that L4 & L5 autonomy won't be possible with Tesla's current sensor setup?
AK: You can add arbitrarily many arbitrarily expensive sensors. You could pepper your entire car with them in all the wavelenghts. Where you draw the line is the important call to make. Our position is that lidar is a LOT of extra cost for not a lot of extra gain, compared to pixels + state of the art neural nets, and that this is a good place to draw it. I do not think this will be controversial in a few years.
Q: Are there apparent limitations in Tesla's depth estimation algorithm say vs Lidar?
AK: There are no fundamental limitations to depth estimation from images, as imo also proven by people driving cars using stereo, effectively. So the signal is clearly there to support L5 functionality. This by itself definitely doesn't mean you're done - you still have to write the algorithm (or rather, train a model, collect your dataset, resolve all the potential issues), and generally make it work well.
Q: I was interested in BEV network. I see the difficulty of pixel-world conversion especially near horizon due to loss of 3D information. My ques is how do you get Ground Truth for these networks, for example for curbs?
AK: Good question. Clean ground truth at scale is definitely the hard part
, but I won't go into full detail here. Also we do worry about the architectures quite a bit because you're right - a lot of information is only at the vanishing line, and 1/3 of the image is clouds and 1/3 is road.
Q: Would you say that current Tesla Autopilot drives somewhat more conservatively in dense traffic than a human driver?
AK: Interesting, no very strong opinions on this point. We typically try to drive as a human would, but do become more cautious in some conditions, e.g. in Stops, or if we detect people near the road and not paying attention to us, etc.
Q: Do you use open source software for ontology management?
AK: In Tesla fashion we develop everything in-house and like it that way a lot. Everything is fully customized to take full advantage of our fleet, its telemetry, etc
Q: Can you touch more on how your team approaches prediction/planning?
AK: We're in a similar camp to everyone else. We have an explicit thing that works ok, and has challenges everywhere, and aspire to use machine learning instead
Q: Do you have an estimate of how many stop-sign like features are required for L4-5?
AK: Haha, good question. All I know is that exotic clips showing me things I've never seen keep coming back even months later.
Q: With all of the collected data in the past few years, do you feel that the perception part of AV is close to being solved?
AK: It's flawless 99% of the time, so it only has to improve about 10,000X