Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Andrej Karpathy - AI for Full-Self Driving (2020)

This site may earn commission on affiliate links.
Due to LIDAR limitations, vision must be solved for L5.

You must have standalone vision working to multiple nines on a LIDAR vehicle. If the LIDAR knows that a light exists but does not know the state then vision must be able to do that all by itself. If LIDAR knows there’s a square block of stucco at an intersection but cannot tell if a stop sign has been painted on it then vision must be solved for L5 to be deployed.

The inverse is not true. You do not need LIDAR deployed to solve for vision.

So LIDAR clearly provides a “cheat” to propel self driving forward maybe 60% of the way, but vision must be solved for the system to be put into production for regular end users.

You need vision to be at 99.99999% of accuracy on ONE TASK when using a Lidar/Radar only car. This is different than needing 99.99999% of accuracy on 100+ TASKS when you have a vision only system. Its currently impossible for a camera only system to achieve the 99.99999% of accuracy that's needed. Definitely not in autonomous driving and not in other areas of deep learning. Its about levels of accuracy and rate of failure. How many miles can you go without a serious perception failure? You need to be able to go millions of miles.

If vision is solved to multiple nines then LIDAR is just an extra expensive wart. Another tool, of course, but not particularly useful when combined with a solved vision system.

Our latest efforts in deep learning is not close. Take for example the fact that there are billions of android phones and hundreds of millions of google homes and every time you use their voice app, they get every single raw audio data (not below ~0.1% in Tesla's case). Yet we haven't even gotten past 99% at any category of Machine Learning and we are in year 8th. Take voice recognition, voice transcription or translation for example. Its still subpar in the real world. This is even with us talking very slow and calmly to it. This is with basically a trillion voice data. Yet Google still haven't solved voice recognition and we are no where close to solving computer vision. Same with Amazon's Alexa (hundreds of millions) and still they fall short.

So the noise that 'all you is a lot of data' is simply misleading.

This is why having different sensor suite that fail/exceed in different ways so that they compliment other sensor suites is the key. It solves the accuracy problem because radar doesn't fail in the same scenarios as cameras and lidars. Also lidar doesn't fail in the same scenarios as cameras and radars and vice versa.

So instead of needing ~99.99999% for one sensor suite. Now you only need ~99.99% for each modality of sensor suite. Its not a lidar vs camera. If someone had a lidar only system they would also need to reach ~99.99999% accuracy with just lidars which is also not currently possible.


I really think karpathy’s talk shed a lot of light on this. They’ve made tremendous progress even though it’s hard to see right now from outside their labs.

The talk simply highlighted what others are doing in state of the art computer vision research. If you have been following the field you would know that Tesla is just reaping the benefits, they are not inventing anything. In-fact the models and the training data and architecture code of the network below is freely available to download.

 
Last edited:
  • Informative
Reactions: diplomat33
If vision goes out, can you drive on lidar+radar alone in the city?
The answer is a definitive "NO".
If you have no lidar, can you drive in the city with vision+radar alone.
The answer is "YES"

So, yes lidar is useless.

yes you can drive on lidar/radar alone in the city and also highway which you purposely ignored.

Again its not about getting a system to work 99% of the time. Its about getting the system to function with 99.99999% of accuracy. Going millions of millions without a failure.

Its also interesting how your logic when applied contradicts itself.

1) You said Lidar is useless because you can't drive in the city with it alone with Radar (even though you can). Yet you omitted highway to signify that you CAN drive on highway with it. So then if you can drive on highway with it how is it useless?

2) You said you can drive with vision/radar alone. But why include radar? You can drive with vision alone. So why didn't you proclaim radar to be useless?​
 
Last edited:
Tesla's aren't blowing through intersections. Knowing an intersection is there is not a problem. This is clear in all of the demos for some time. Indeed your original assertion I am responding to says Tesla is inferior at "common tasks like turning at intersections, responding to traffic lights and stop signs" as a result of relying on vision. Hence my reply asking you to "please explain how Waymo knows how to proceed at these intersections", better because they have lidar (or HD maps).

So you keep missing the point... stopping at the intersection isn't the problem. Tesla's vision clearly demonstrates that. And Karpathy already explained they have maps that can ID that there's an intersection "with a stoplight somewhere in it". So it's the ability to proceed at that point that is the issue here.

So, how does something other than vision help? You say "Waymo has excellent camera vision that is reliable at reading the stop lights." Once you have that ability, you already have the ability to identify a stoplight. So tell me again how vision is insufficient?

Again its all about accuracy and rate of failure. Its already impossible to get 99.99999% in computer vision, let alone on 100+ tasks. But worse, false positive and false negative scales with inclement whether (noise). The means the more the obstruction, fog, snow, blizzard, torrential rain, wet road night, direct sunlight, dust storm, road mist, smoke. The worse the accuracy.

So the accuracy you get on clear sky in the day is not the same as when you are in a blizzard with zero visibility. So if you built a HD Map based on your vision software in clear sky in the day. It will help out your system when its struggling in a blizzard, zero visibility fog, heavy rain at night, etc.

The road layout can instantly change in half a second in fog situations. You can for example have a lane that suddenly ends and curves to another road in 0.5 seconds. if you relied just on vision alone, that will be a very uncomfortable if not unsafe ride to be in.

eFlMjBo.png


You can have a traffic light control appear in 1 second causing you to emergency brake.
In this instance the car behind the driver popped out of the fog and actually ran the red light because it couldn't brake in time.

lxHNtwY.png


Imagine if you wanted to make a left, the obtuse lane is completely obscured affecting the accuracy of your vision system. If you were turning into roads with multiple lanes. You wouldn't even see the road edge, you would essentially be turning blind. Lets say you were making a u turn, the same problem exists as cars will pop out of no where and since you rely just on vision it would be an accident waiting to happen.

WRPcEXv.png


What does HD map do?
  • Provides foresight for comfort control
  • Provide sensor redundancy for:
  1. When the scene is obscure.
  2. When visibility is low, for example lane markings or road boundaries may be unclear.
  3. When road signs affected by outside influences (e.g., a stop sign knocked over or a sign obstructed by a parked truck)..etc
 
Last edited:
  • Like
Reactions: diplomat33
Again its all about accuracy and rate of failure. Its already impossible to get 99.99999% in computer vision, let alone on 100+ tasks. But worse, false positive and false negative scales with inclement whether (noise). The means the more the obstruction, fog, snow, blizzard, torrential rain, wet road night, direct sunlight, dust storm, road mist, smoke. The worse the accuracy.

So the accuracy you get on clear sky in the day is not the same as when you are in a blizzard with zero visibility. So if you built a HD Map based on your vision software in clear sky in the day. It will help out your system when its struggling in a blizzard, zero visibility fog, heavy rain at night, etc.

The road layout can instantly change in half a second in fog situations. You can for example have a lane that suddenly ends and curves to another road in 0.5 seconds. if you relied just on vision alone, that will be a very uncomfortable if not unsafe ride to be in.

eFlMjBo.png


You can have a traffic light control appear in 1 second causing you to emergency brake.
In this instance the car behind the driver popped out of the fog and actually ran the red light because it couldn't brake in time.

lxHNtwY.png


Imagine if you wanted to make a left, the obtuse lane is completely obscured affecting the accuracy of your vision system. If you were turning into roads with multiple lanes. You wouldn't even see the road edge, you would essentially be turning blind. Lets say you were making a u turn, the same problem exists as cars will pop out of no where and since you rely just on vision it would be an accident waiting to happen.

WRPcEXv.png


What does HD map do?
  • Provides foresight for comfort control
  • Provide sensor redundancy for:
  1. When the scene is obscure.
  2. When visibility is low, for example lane markings or road boundaries may be unclear.
  3. When road signs affected by outside influences (e.g., a stop sign knocked over or a sign obstructed by a parked truck)..etc
Again:

1) If you can't determine with vision what the state of the light is, then maps HD maps (or lidar) buy you nothing, other than sitting at the intersection

2) If you can do #1, then you don't need HD maps or lidar. (or really even SD maps, but they are there for your foggy corner cases)
 
  • Like
Reactions: mikes_fsd
Do you get why we are confused about what you are saying? Take this example:

Is it alway the same cm level path? That is what tracing (vs creating) a precise (as opposed to accurate) path ahead of time (as opposed to while manuvering and reacting) implies.
"Waymo has centimetre level accurate maps and localization which allow it to trace a very precise path for the car to take when proceeding through an intersection or making a turn at an intersection"

I would think it is the same cm level path.

In it you say it takes "the same cm level path" provided by the HD maps every time it goes through the intersection. But then you change your mind:

No. Trains are constricted to only stay on a single path and can't maneuver on their own. Waymo cars can maneuver independently on their own. Waymo cars are not constricted to just stay on a single path every time. The path on the HD map is simply to help the car make the maneuver more precisely. So for example, when the Waymo car approaches an intersection, it does not have to "guess" with vision alone how to make the turn but has map data to know how to make the turn accurately.

and say that it doesn't use the same path every time.
 
yes you can drive on lidar/radar alone in the city and also highway which you purposely ignored.

Really? o_O How?
  1. It approaches a traffic light. How does it know if it is green, yellow, or red?
  2. It approaches a construction area and a flagger is holding up a sign. How can it use LiDAR to determine if the sign says stop, slow, to turn, or something else?
 
Do you get why we are confused about what you are saying? Take this example:

In it you say it takes "the same cm level path" provided by the HD maps every time it goes through the intersection. But then you change your mind:

and say that it doesn't use the same path every time.

Nothing confusing about it. HD maps provide a path to help the car navigate an intersection for example. But they are guides, the car is not locked-in on those paths. They are not rails that the car is stuck on. It can deviate to avoid obstacles for example.

This is a HD map that Waymo uses. You can see the paths in green.

Screen%20Shot%202016-04-21%20at%202.36.33%20AM.png


You will note that the HD map has road edges, lane lines, cross walks, traffic lights and paths traced out to help the car navigate this complex intersection. My point is that the car can use these paths on the HD map to help it make the turns successfully.

But the cars are not stuck on a predetermined path no matter what. They are are able to determine a new path to go around obstacles like construction or cyclists. Here is a video of Google self-driving cars able to navigate around obstacles like construction and cyclists. You will see that the cars are able to make a new path to go around obstacles. Keep in mind that this is from 2014. So this what Google self-driving cars were already able to do back then.


Note the green path in the video which is the car's planner for what path it will take. If there are no obstacles, the path can be the same as the HD map's path, but the planner can alter the car's planned path to avoid obstacles.
 
Sorry but that makes no sense. There are plenty of systems that use a particular sensor or have redundancy. It's part of what makes the system work effectively.
I said: "If the car can't do it on its own [without maps], it [maps] is not extra reliability, it is the only 'reliability'."

For example, airplanes have radar but we don't call it a weakness because the pilots should be able to fly with just their human vision.

Airplanes fly perfectly fine without radar under visual flight rules.
In the United States, instruments required for IFR flight in addition to those that are required for VFR flight are: heading indicator, sensitive altimeter adjustable for barometric pressure, clock with a sweep-second pointer or digital equivalent, attitude indicator, radios and suitable avionics for the route to be flown, alternator or generator, gyroscopic rate-of-turn indicator that is either a turn coordinator or the turn and bank indicator.
Instrument flight rules - Wikipedia
allows safe flight when eyeballs aren't enough. However, it is not required on clear days.

Surgeons use tools but we don't say that surgeons, with their 15 years of medical school, really should do the surgery without that special tool. No, we are glad that the tool makes the surgery easier for the doctor and reduces the chances of complications.

A surgeon cannot cut tissue well without a scalpel, the scalpel is required. They cannot perform a heart transplant without a heart lung machine, the HLM is required. It the car cannot maneuver without maps, the map are required, not additional.

Same with autonomous driving. If HD maps or lidar makes the autonomous driving safer and more reliably, that's a good thing. It's not a weakness to use a tool that makes the job better.
If the maps are truely additional, yes I agree, which is what I said "If the car can't do it on its own [without maps], it [maps] is not extra reliability, it is the only 'reliability'." Nothing has been presented that indicates Waymo maps are optional.

Its a weakness in the original system if it requires the maps to function. And if not a 'weakness' then at the very least you cannot say it increases reliability. Wait, I take that back, because if it can't function without them, then system starts at zero reliability. It adds no redundancy, no extra safety. It is a requirement in that case.
 
Let's not forget Tesla abandoned this technology, not because it didn't work but because it was too expensive.
Incorrect, MobilEye dropped Tesla after the Joshua Brown fatality. Tesla had been planning for AP 2 to have both the ME chip and Tesla NN HW.
It's On: Tesla, Mobileye Spar Bitterly Over What Triggered Autopilot-Related Split

Tesla parting ways with Mobilieye on Autopilot/Self-driving development following fatal crash [Updated] - Electrek

Edit: Eventually, Tesla would have dropped ME, but not at the time they split.
 
  • Like
Reactions: MP3Mike
If the maps are truely additional, yes I agree, which is what I said "If the car can't do it on its own [without maps], it [maps] is not extra reliability, it is the only 'reliability'." Nothing has been presented that indicates Waymo maps are optional.

Its a weakness in the original system if it requires the maps to function. And if not a 'weakness' then at the very least you cannot say it increases reliability. Wait, I take that back, because if it can't function without them, then system starts at zero reliability. It adds no redundancy, no extra safety. It is a requirement in that case.

I feel like we are talking in circles. I guess you are trying to say that if the HD map is required to perform a task (like turning at an intersection), you consider that a design flaw. I am saying the system can do the task without HD maps but HD maps makes the task more reliable. And since high reliability is the goal, HD maps are a necessity. But they are not a weakness since they are critical to getting the desired reliability. It is not about performing the task but performing the task reliably. I am sure Waymo could do autonomous turns at intersections without HD maps if they really wanted to but they don't because they know it would not be reliable enough for deployment. I guess it is only a "weakness" if you are starting with the assumption that the goal is to do autonomous driving reliably with just camera vision.

It depends what your goal is.

1) Can you do basic autonomous driving without HD maps? Yes.

2) Can you do safe and reliable (99.99999%) autonomous driving without HD maps? No, I don't believe you can. At least, I see no evidence that you can. Therefore, if your goal is safe and reliable autonomous driving, you need HD maps. It is not a weakness to add something that you need in order to achieve your desired goal.
 
  • Disagree
Reactions: MP3Mike
Vision alone can have many false positives.
(not sure why I'm doing this but...)

Because for every one of these that you gave as an example that might currently fool vision:
5ec83b44-72cb-4aad-8d09-0d7d4063e761-jpeg.534743


There's one of these that will be missed by HD maps (and likely lidar as the shape it returns will be weird):
TempStop.jpg


So, regardless of detecting light status, reading signs, finding road lines/edges, etc... you have to solve for vision. This makes sense... humans drive based on that. The roads are built for those humans to drive on.

Once you solve vision reliably enough to make up for HD maps and lidar deficiency, tell me again why you need either of the latter?

Yes, humans also use maps for navigation... but we don't use them to know how to proceed through an intersection, or to avoid driving off the road. We use them to get to a general destination... the intervening details are based on our primary sensors: our eyes.
 
So, regardless of detecting light status, reading signs, finding road lines/edges, etc... you have to solve for vision. This makes sense... humans drive based on that. The roads are built for those humans to drive on.

Once you solve vision reliably enough to make up for HD maps and lidar deficiency, tell me again why you need either of the latter?

Yes, humans also use maps for navigation... but we don't use them to know how to proceed through an intersection, or to avoid driving off the road. We use them to get to a general destination... the intervening details are based on our primary sensors: our eyes.

First of all, we know how to identify shapes from lidar so I am not sure why you think it would be a weird shape.

Second, yes, camera vision is necessary for autonomous driving. But as Blader tried to explain, current camera vision is not reliable enough. That's what you seem to be missing. You say "once we solve vision reliably". Sure. But solving vision reliably means solving vision 99.99999% for every single of the 100+ tasks that camera vision needs to do. With current tech, we can't do that. So we cannot "solve vision reliably" yet.

We need HD maps and lidar because camera vision is not solved reliably enough!
 
Last edited:
Yes, camera vision is necessary for autonomous driving. But as Blader tried to explain, current camera vision is not reliable enough. That's what you seem to be missing. You say "once we solve vision reliably". Sure. But solving vision reliably means solving vision 99.99999% for every single of the 100+ tasks that camera vision needs to do. With current tech, we can't do that. So we cannot "solve vision reliably" yet.

We need HD maps and lidar because camera vision is not solved reliably enough!

Yet they don't fix the issues pointed out multiple times here. Yet you keep asserting that's what allows you to go through intersections better, etc...

No, they don't.
 
Yet they don't fix the issues pointed out multiple times here. Yet you keep asserting that's what allows you to go through intersections better, etc...

No, they don't.

Yes HD Maps do fix the issues I mentioned. That's the whole point. HD maps help solve those cases that camera vision can't do reliably.
 
First of all, we know how to identify shapes from lidar so I am not sure why you think it would be a weird shape.

You don't find a rectangular stop sign weird?
SmartSelect_20200426-100332_Firefox.jpg

But solving vision reliably means solving vision 99.99999% for every single of the 100+ tasks that camera vision needs to do. With current tech, we can't do that. So we cannot "solve vision reliably" yet.

We need HD maps and lidar because camera vision is not solved reliably enough!

What is reliability of an HD map vs time? Do the maps stop you from hitting a non stationary object?

Yes HD Maps do fix the issues I mentioned. That's the whole point. HD maps help solve those cases that camera vision can't do reliably.

No, they hard code around the vision issues, for the scene at the time the map was made. That is not solved beyond a specific case.

Can you do safe and reliable (99.99999%) autonomous driving without HD maps? No, I don't believe you can. At least, I see no evidence that you can. Therefore, if your goal is safe and reliable autonomous driving, you need HD maps. It is not a weakness to add something that you need in order to achieve your desired goal.

If the world can change, the maps are not reliable. If the maps are not reliable, then your system is no safer than vision only. In face it's worse because you spent resources on mapping instead of improving vision.

Blindly following a path is not safe.

Note: augmented maps, like blind driveway or stop shead signs, provide data that is not otherwise able to be collected and do improve on even 100% accurate vision. Of course, if vehicles only traveled at a speed where they could stop in half the visible distance, that would be safer and handle any obstruction.
 
You don't find a rectangular stop sign weird?
View attachment 536066


What is reliability of an HD map vs time? Do the maps stop you from hitting a non stationary object?



No, they hard code around the vision issues, for the scene at the time the map was made. That is not solved beyond a specific case.



If the world can change, the maps are not reliable. If the maps are not reliable, then your system is no safer than vision only. In face it's worse because you spent resources on mapping instead of improving vision.

Blindly following a path is not safe.

Note: augmented maps, like blind driveway or stop shead signs, provide data that is not otherwise able to be collected and do improve on even 100% accurate vision. Of course, if vehicles only traveled at a speed where they could stop in half the visible distance, that would be safer and handle any obstruction.

You keep saying that we need to "solve vision". But I am telling you we have not "solved vision".
 
You keep saying that we need to "solve vision". But I am telling you we have not "solved vision".
I feel like folks are arguing two points that converge, but are different. Yes HD Maps are nice to have, but someone has to do the mapping first and it has to be done often, otherwise you will have to rely on vision for ground truth, which makes people argue why HD Maps are needed at all.
 
Really? o_O How?
  1. It approaches a traffic light. How does it know if it is green, yellow, or red?
  2. It approaches a construction area and a flagger is holding up a sign. How can it use LiDAR to determine if the sign says stop, slow, to turn, or something else?

yes explain

So, regardless of detecting light status, reading signs, finding road lines/edges, etc... you have to solve for vision. This makes sense... humans drive based on that. The roads are built for those humans to drive on.

Do you guys not know that lidar can detect and classify lane lines, road markings, crosswalks, road edges, curbs, road signs and traffic signs, and possibly light status, etc?

Lidars has always been able to see lane lines, road markings, crosswalks, etc. Because it checks the light reflectivity. Additionally it always has been able to get the geometric shape which includes stop signs, yield signs, anything as you see below.

That alone you can drive ANYWHERE without having to go through a traffic light control. Why? because there is this thing called turn right on red. This allows you to turn right on any intersection with a traffic light. In addition, because you can also turn left away from an intersection into another road or side streets. There is not a destination that i couldn't get to if i avoided going through a traffic light. For example, my work which is 20 miles away i can get through without going through a traffic light. The same is the case for every spot i usually visit.

So can you drive with Lidar ALONE? the answer is an absolute yes.

Random cheap off the shelf automotive grade Lidar ($800)
1sExh5H.png



You don't find a rectangular stop sign weird?
View attachment 536066

Now you say, what about construction. A stop sign that is the same as a slow sign or a circular stop sign, etc.
Well this is 2020 not 2010. Some Lidars can read the ambient lighting which allows them to get imaging information. Waymo lidar can do that, Ouster aswell, etc. So it allows them to read road signs and traffic signs. Including traffic light status although that one is the only question mark. But since light status isn't determined by the color of the light but by the position of the illuminance.

So can you drive on Lidar only? again the answer is yes.

Waymo's Lidar which can read road and traffic signs not just get its geometric shape.

lidar.gif


Ouster Lidar which can also read road and traffic signs and not just get its geometric shape.
 
Airplanes fly perfectly fine without radar under visual flight rules.
In the United States, instruments required for IFR flight in addition to those that are required for VFR flight are: heading indicator, sensitive altimeter adjustable for barometric pressure, clock with a sweep-second pointer or digital equivalent, attitude indicator, radios and suitable avionics for the route to be flown, alternator or generator, gyroscopic rate-of-turn indicator that is either a turn coordinator or the turn and bank indicator.
Instrument flight rules - Wikipedia
allows safe flight when eyeballs aren't enough. However, it is not required on clear days.

Oh its not needed because you can fly on pristine clear days. Like seriously? That's your logic? You literally make up new logics as you go (different from what you had in the past).

Not only are all commercial jets required to fly under IFR but the next time you fly why don't you have the pilots rip out 3 of the 4 identical flight controls systems. And tell the copilot to go home because the other pilot can fly just fine without him.

While you are at it, have them rip out one of the gyros. why do they have duplicates of that crap? its not even needed. Rip out 2 of the airspeed and altimeter sensors. Why do they have duplicates, triple and quadruple of every sensor? Rip those crap out!

Heck Demand them to rip it all of or you won't fly. Because that is a poor design and a very weak system.

If the maps are truely additional, yes I agree, which is what I said "If the car can't do it on its own [without maps], it [maps] is not extra reliability, it is the only 'reliability'." Nothing has been presented that indicates Waymo maps are optional.

Its a weakness in the original system if it requires the maps to function. And if not a 'weakness' then at the very least you cannot say it increases reliability. Wait, I take that back, because if it can't function without them, then system starts at zero reliability. It adds no redundancy, no extra safety. It is a requirement in that case.

Stop contradicting yourself every other week. You previously said anyone who uses a map for anything other than for routes is not general aka is a weak system. Tesla AP features uses and NEEDS map to function. Therefore its weak. Apply your flawed logic universally.