Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

CVPR 2020 Workshop on "Scalability in Autonomous Driving" LIVE DISCUSSION

This site may earn commission on affiliate links.
The intersection prediction videos Karpathy showed are still able to vaguely predict some type of intersection despite not being completely certain. The videos seem similar to how a human would implicitly visualize the world in their head.

I personally haven't seen this particular visualization style in NN research.
 
  • Like
Reactions: mikes_fsd
He did repeat the silliness about HD maps
After re-watching his segment, he clearly explains the reason they do not view HD maps as helping in solving the FSD problem. Scalability and Cost.
HD maps would require a ton of maintenance and that would cost way too much to make it feasible as a general FSD solution. (covered it again in the first question asked post presentation)

Important note/observation: Just because his view and approach does not line up with yours does not make him silly at all, in fact it actually make all your postings on the subject of FSD silly... Given his contribution to the solution of the FSD problem and comparing it to your contribution to date.

A few interesting timestamps from the Karpathy section:
  • [~7:01:00 timestamp] Tesla has very few segmentation tasks (pixel-level classification) because you need to make 3d sense of what you're seeing.
  • [~7:01:48 timestamp] A lot of the heavy lifting is done by NNs: parked cars, cut-ins etc, is all outputs of NN's not based on "brittle rules on highly abstract representation"
  • [~7:08:00 timestamp] Defense of iterative (aka Tesla) approach to deploying features:
    • "You are allowed to know that you don't know"
    • real life situations "can get out of hand quickly" (his image of a crazy roundabout with multiple roundabouts within the bigger one)
    • because of the complexity of real world, he thinks this is the likely route, where you might not be able to handle a situation, so you route around it
    • only until you can handle the situation fully.
  • [~7:11:20 timestamp] Labeling Costs: a clean and varied data set is a "sure way" to get a NN to perform well. Other approaches are still hit or miss.
  • [~7:13:00 timestamp] The Correct Way - Karpathy says (not Elon) that the iterative approach to FSD is the "correct way"
    • The active frontier is too big to try to implement in a big bang approach - or as Karpathy says "can't just in a binary fashion develop it and ship it"
    • and then the audio cut out for a good chunk of time... :(
 
Karpathy showed off some new video of birds-eye-view lane-line and divider predictions. Here's a few images:
View attachment 551975 View attachment 551976 View attachment 551980View attachment 551973 View attachment 551974

Looks like the intersection prediction is already pretty good even when there isn't a clear view from the cameras.

It's *REALLY* bad right now in the released versions, and has been forever. Every single day I have to override in an intersection near my place because it blows it completely and tries to swerve across lanes when going "straight" through.
 
After re-watching his segment, he clearly explains the reason they do not view HD maps as helping in solving the FSD problem. Scalability and Cost.
HD maps would require a ton of maintenance and that would cost way too much to make it feasible as a general FSD solution. (covered it again in the first question asked post presentation)

You are taking me out of context. Here is my entire quote:

"He did repeat the silliness about HD maps knowing the precise position of leaves on a tree. Again, that is not a detail that you need to put in HD maps."

So my reference to silliness was about mapping each leaf on a tree. My comment had nothing to do with scalability or cost of HD maps.

But I would point out that every FSD company out there has demonstrated successes with HD maps

Important note/observation: Just because his view and approach does not line up with yours does not make him silly at all, in fact it actually make all your postings on the subject of FSD silly... Given his contribution to the solution of the FSD problem and comparing it to your contribution to date.

Again, I never called Karpathy silly because I did not agree with Tesla's FSD approach. I called his remark about mapping individual leaves on a tree silly. Saying HD maps map each individual leaves on a tree, that's silly, not Tesla's approach.

And just because you like Tesla's approach, does not automatically make it the right one. There are many experts and companies that have demonstrated real successes in FSD with different approaches.

And you can criticize Cruise or Waymo for not being practical or scalable, but the fact remains that they have demonstrated reliable L4 FSD with their approach.

In fact, I would argue that the more companies try different approaches, the more likely we will find the best approach, and the quicker we will get to L5 autonomy.
 
Last edited:
Every single day I have to override in an intersection near my place because it blows it completely and tries to swerve across lanes when going "straight" through.
Oh indeed it's bad right now. I have a nearby intersection that should be very easy to test once the rewrite/birds-eye-view network with these intersection outputs is released:
oncoming intersection.jpg


This one is particularly difficult, but there are plenty more examples for larger intersections where Autopilot incorrectly shifts into an adjacent lane or worse into oncoming traffic.
 
What Karpathy is showing is not on any released version of Autopilot.
That is the the Birds Eye View of all 8 camera feeds stitched together - this is the re-write that Elon was talking about.

That's fine, I'm just noting that it's an area where it'll be nice to see some actual improvement if any of this stuff actually gets released at some point.
 
  • Like
Reactions: willow_hiller
I wonder why Karpathy doesn't talk about using satellite maps to augment the intersection training. I'm assuming this is a bad idea because some intersections are under bridges and the satellite representation would be incorrect.

You can also do HD mapping with cameras in the car, which gets around the problem of bridges obstructed satellite view. Heck some companies are using off the shelf dash cams to do HD mapping successfully.

This link might be informative: Toyota and its Partners Are Building Maps for Autonomous Vehicles Using Satellite Images & Dash Cam Footage
 
Last edited:
  • Disagree
  • Like
Reactions: bzbee and mikes_fsd
You can also do HD mapping with cameras in the car, which gets around the problem of bridges obstructed satellite view. Heck some companies are using off the shelf dash cams to do HD mapping successfully.

I would have automatically agreed that HD maps aren't as scalable, at least with a Lidar based approach on dedicated mapping vehicles. That assumption needs to be re-examined with MobilEye's approach of using Vidar on MobilEye equipped consumer car data to generate HD maps. I'd guess MobilEye patented this approach, so it may be a no go for Tesla though.
 
I would have automatically agreed that HD maps aren't as scalable, at least with a Lidar based approach on dedicated mapping vehicles. That assumption needs to be re-examined with MobilEye's approach of using Vidar on MobilEye equipped consumer car data to generate HD maps. I'd guess MobilEye patented this approach, so it may be a no go for Tesla though.

The idea that HD maps are not scalable comes from the old idea of driving around in a car with a big lidar on the roof and manually mapping every road. That's the old way of doing it but things are changing. Mobileye is crowdsourcing HD mapping automatically from cars with cameras. Toyota is doing HD mapping with satellites. Nvidia has end to end mapping that is fully automated. There are new methods now to do HD maps that are cheaper and more scalable.
 
  • Informative
Reactions: OPRCE and jw934
The idea that HD maps are not scalable comes from the old idea
No! The idea of HD maps being not scalable comes from a "new" Zoom presentation done by Karpathy on Monday, June 15th 2020. (less than 24 hours before this post)

He clearly stated that Tesla maps have a lot of useful information, but HD maps and trying to keep them accurate all the time for ALL roads that ANY Tesla ANYWHERE can/will drive, would not be scalable and cost prohibitive to manage.
 
Last edited:
  • Disagree
Reactions: diplomat33
and then the audio cut out for a good chunk of time... :(
He answered some more questions afterwards maybe because the audio was cutting out.


Q: Do you think the long tail can be "solved", or do you (Tesla) more just aim to be e.g. 100x better than humans? How might you otherwise "prove" safety?

AK: There's no "solved", it's a never-ending march of 9s. I'd be very happy to reach 100x better than humans, that would be huge impact on saving lives and the economy broadly. You can't "prove" safety, but you can produce a LOT of statistical data. Which we can uniquely do, I believe. I think it has to be a lot of data and telemetry. Without feature: %rate of something. With feature: %rate of something. And this is over 1B miles. It has to look something like that.


Q: When and how will you (or Elon) tell Tesla customers that L4 & L5 autonomy won't be possible with Tesla's current sensor setup?

AK: You can add arbitrarily many arbitrarily expensive sensors. You could pepper your entire car with them in all the wavelenghts. Where you draw the line is the important call to make. Our position is that lidar is a LOT of extra cost for not a lot of extra gain, compared to pixels + state of the art neural nets, and that this is a good place to draw it. I do not think this will be controversial in a few years.


Q: Are there apparent limitations in Tesla's depth estimation algorithm say vs Lidar?

AK: There are no fundamental limitations to depth estimation from images, as imo also proven by people driving cars using stereo, effectively. So the signal is clearly there to support L5 functionality. This by itself definitely doesn't mean you're done - you still have to write the algorithm (or rather, train a model, collect your dataset, resolve all the potential issues), and generally make it work well.


Q: I was interested in BEV network. I see the difficulty of pixel-world conversion especially near horizon due to loss of 3D information. My ques is how do you get Ground Truth for these networks, for example for curbs?

AK: Good question. Clean ground truth at scale is definitely the hard part :) , but I won't go into full detail here. Also we do worry about the architectures quite a bit because you're right - a lot of information is only at the vanishing line, and 1/3 of the image is clouds and 1/3 is road.


Q: Would you say that current Tesla Autopilot drives somewhat more conservatively in dense traffic than a human driver?

AK: Interesting, no very strong opinions on this point. We typically try to drive as a human would, but do become more cautious in some conditions, e.g. in Stops, or if we detect people near the road and not paying attention to us, etc.


Q: Do you use open source software for ontology management?

AK: In Tesla fashion we develop everything in-house and like it that way a lot. Everything is fully customized to take full advantage of our fleet, its telemetry, etc


Q: Can you touch more on how your team approaches prediction/planning?

AK: We're in a similar camp to everyone else. We have an explicit thing that works ok, and has challenges everywhere, and aspire to use machine learning instead :)


Q: Do you have an estimate of how many stop-sign like features are required for L4-5?

AK: Haha, good question. All I know is that exotic clips showing me things I've never seen keep coming back even months later.


Q: With all of the collected data in the past few years, do you feel that the perception part of AV is close to being solved?

AK: It's flawless 99% of the time, so it only has to improve about 10,000X :)
 
Last edited:
No! The idea of HD maps being not scalable comes from a "new" Zoom presentation done by Karpathy on Monday, June 15th 2020. (less than 24 hours before this post)

He clearly stated that Tesla maps have a lot of useful information, but HD maps and trying to keep them accurate all the time for ALL roads that ANY Tesla ANYWHERE can/will drive, would not be scalable and cost prohibitive to manage.

Yes, that is Tesla's position. But I think it ignores the evidence that HD maps are scalable:
End-to-end HD Mapping for Self-Driving Cars from NVIDIA Automotive
Toyota and its Partners Are Building Maps for Autonomous Vehicles Using Satellite Images & Dash Cam Footage
HERE and Mobileye: crowd-sourced HD mapping for autonomous cars
 
  • Disagree
  • Like
Reactions: OPRCE and mikes_fsd
I think it ignores the evidence that HD maps are scalable
They don't ignore the evidence. They have a 1 million car fleet, I am pretty sure they have had at least 1 person look at the costs needed (data, storage and maintenance) to keep up with all that data and make sure that maps stay up to date!

No one, NO ONE else has access to this much data.
So all your pretty links to pretty presentations and shiny videos cannot answer the question of scalability until you actually have data to prove it.
Yet again, I am here because I agree with the Tesla approach and want that accelerated.
 
@mspisars It is obvious you are a big believer in Tesla's approach. You think Tesla's approach is scalable and Tesla will achieve generalized FSD that works everywhere. And you think Waymo and Cruise's approach is not scalable. Let's see if you are right. If Tesla gives me a software update where the car can drive me anywhere without me needing to pay attention, then I will agree that their approach was the right approach. But that has not happened yet. So far all I have is driver assist where I need to pay attention and that's not FSD.
 
  • Like
Reactions: Matias and OPRCE
Let's see if you are right. If Tesla gives me a software update where the car can drive me anywhere without me needing to pay attention, then I will agree that their approach was the right approach. But that has not happened yet.

Perfect, with one caveat, ALL parties that claim they offer FSD have to be graded by the same standard, specifically the "generalized FSD that works everywhere" AND available at any time to the end user. (i.e. not a presentation or a dedicated route or specific times)
You think Tesla's approach is scalable and Tesla will achieve generalized FSD that works everywhere.
Yes, that is exactly the camp that I threw my money in by purchasing Tesla cars with Autopilot.
 
Perfect, with one caveat, ALL parties that claim they have "solved" FSD have to be graded by the same standard, specifically the "generalized FSD that works everywhere" AND available at any time to the end user. (i.e. not a presentation or a dedicated route or specific times)

Yes, that is the same standard that I hold everybody to.