Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

What a self driving car "sees" (video from nVidia, from CES 2016)

This site may earn commission on affiliate links.
What a self driving car "sees" (video from nVidia, from CES 2016)

Interesting nvidia video. Looks like they have some more work to do. Their path planner shows their car can switch lanes up until the point a car passes it, which would result in an accident.

Not sure how the lidar helps them, other than maybe giving more confidence in their vehicle recognition video analysis. Would be interesting to see how big the lidars are.

Also interesting is the implied dig at Tesla by showing how inaccurate GPS is, since Elon has said that Tesla uses GPS. Actually, I've always been wondering about that. GPS is indeed inaccurate. I can see how Tesla can create a high resolution map by averaging million of trip miles, but how do you use such a map given that an individual car's GPS is inaccurate?
 
I think he said they also use GPS. I took that to meant they get their place on the earth - the road map - from GPS and place the can in it. That is what Tesla does as well. They then use the tighter data from the local sensors to more precisely place the car in the big wide world. I think.
 
Seems like NVIDIA have lots of work to do. Cars literally popping in and out of detection right next to their car.
And I don't see how their system with 100x more power consumption can compete with EyeQ4.
Mobileye's free space algorithm seems much more accurate. They also explain that Lidar isn't useful at that accuracy level.
 
Also interesting is the implied dig at Tesla by showing how inaccurate GPS is, since Elon has said that Tesla uses GPS. Actually, I've always been wondering about that. GPS is indeed inaccurate. I can see how Tesla can create a high resolution map by averaging million of trip miles, but how do you use such a map given that an individual car's GPS is inaccurate?

To answer my own question: the latest Mobileye video a few posts above answers this. A "high resolution map" is a map that records the position of key "landmarks" such as traffic signs. The system knows where it is at all times by estimating distance from, say, a specific freeway overpass sign. Since signs occur frequently, even on desert highways, and they generally don't move, it's a great way of localizing yourself to within 10 cm or so. So GPS might be used to localize yourself within 10m of somewhere, and then you fix a position within 10cm by observing where the signs are or were as you drive past them. Very clever, and I can see how this would work well.
 
My quick-and-dirty:

1. Mobileye's camera-heavy approach seems a superior design..in good weather conditions.

2. Nvidia's integration of LIDAR seems to provide a fundamentally-needed edge when rain, fog, snow - particularly snow-covered roads obscure lane marking - makes camera reliance untenable.

I'm glad we have a significant equity position in both companies.
 
Humans only rely on optics in bad weather as well. No reason why a computer vision system can't do as well as a human. Does it mean driving really slowly, like a human does in heavy fog? Sure, but frankly, I wouldn't want a car to be zipping down the road in heavy fog anyways!

Mobileye's deep learning (aka neural network) and high resolution maps should eliminate the need for lidar in adverse weather conditions such as snow covered roads. Positioning to within 10cm is pretty good!
 
Fair enough. I wonder whether a well-functioning autonomous can - eventually - perform better than the best human driver does under the truly wretched conditions one can find in a normal Alaskan January: whip-blowing snow with visibility from 2 to 20 feet; determination of road surface based on combinations of feeling outer edge of pavement/gravel interface; the tracks - incuse and obtuse - of prior vehicles; extrapolation of road ahead based upon remembrance of the most recent oncoming vehicle's headlights; recollection that after that latest dip the road curves to the right for 30 yards, then bumps to the left....


....at the very least, those systems won't get as fatigued as any human does after some number of hours of such driving.
 
As someone somewhat involved in the deep learning field, I am biased. But in my mind, there is little question that we're nearing a point where an ensemble of models will be able to handily beat a human at the tasks you mention, Audie. I think they'll lag behind the "easy" drive, but it'll happen. The nice thing, as was mentioned above, is that we have the ability to develop sensors that see what we don't. Imagine we redesign lane paint and include a substance that's easily detected through snow, ice, etc., and doesn't require light. If we have the correct sensor, that's a huge addition to the existing self-driving models. Or, more simply, sense the solid road (for your example). Map a real-time topographical model of the road ahead and avoid potholes, edges, etc.

The thing I wish that video showed was what the computer really sees, which is a series of matrices which include pixel colors, intensities, etc. for the entire image. When learning about image recognition, that was a big "aha" moment for me. Show a picture of a dog, then flip to what it looks like as a numeric matrix of RGB values. Then it's not so easy to decipher.
 
Fair enough. I wonder whether a well-functioning autonomous can - eventually - perform better than the best human driver does under the truly wretched conditions one can find in a normal Alaskan January: whip-blowing snow with visibility from 2 to 20 feet; determination of road surface based on combinations of feeling outer edge of pavement/gravel interface; the tracks - incuse and obtuse - of prior vehicles; extrapolation of road ahead based upon remembrance of the most recent oncoming vehicle's headlights; recollection that after that latest dip the road curves to the right for 30 yards, then bumps to the left....


....at the very least, those systems won't get as fatigued as any human does after some number of hours of such driving.

Based off of the Mobileye talk, using the landmarks that will still be recognizable even when snowing(signs, poles, etc) the car should know exactly where on the road it is, even if it can't see the road at all. Based on that information the car should be able to keep you exactly in your lane +- 3.5cm.
 
The thing I wish that video showed was what the computer really sees, which is a series of matrices which include pixel colors, intensities, etc. for the entire image. When learning about image recognition, that was a big "aha" moment for me. Show a picture of a dog, then flip to what it looks like as a numeric matrix of RGB values. Then it's not so easy to decipher.

A true visual neural net does not have a pixel per pixel visual field layer (at least beyond the very first photon camera sensor input layer). The input is immediately convoluted into detectors for edges, contrast, lines, and primitive shapes, and then gets more and more abstract as the layers go up. So there is no great way of seeing what the computer sees, other than at the final semantic layer, which is described, but not shown in the latest Mobileye talk (this is the front of a car, back of a car, sign type xyz, person, curb, roadway, etc.)

The human visual cortex is the same way. You might think there is a low level brain layer that is a pixel for pixel representation of what the eye picks up, but there isn't really, other than the rods and cones themselves that are light activated.

Realizing than even humans don't have a pixel map in their brains was an aha moment for me...

Btw, don't forget that Mobileye technologies rely on grayscale cameras only, no color. I don't know if they ever plan on using color information...
 
Based off of the Mobileye talk, using the landmarks that will still be recognizable even when snowing(signs, poles, etc) the car should know exactly where on the road it is, even if it can't see the road at all. Based on that information the car should be able to keep you exactly in your lane +- 3.5cm.

Yes, and then fuse in other information like the car's speed and where the other cars are, and you'll have a very good idea indeed of where you are and where the road should be.

I remember driving once on a highway at night when the fog got so thick that my passenger and I had to roll down our windows and look directly down at the road for lane markings. It was very slow going until a car came up behind us to illuminate the road better with their headlights. And then when we hit oncoming traffic, we could sense the road much better just by following the snake of lights coming at us.

It'll be a while until auto drive is as good as a human in these extreme weather conditions, but given the computer's ability to fuse GPS, high resolution landmark based maps, and dead reckoning based on car speed and heading, I am confident that the car will eventually out perform humans in adverse weather.
 
A true visual neural net does not have a pixel per pixel visual field layer (at least beyond the very first photon camera sensor input layer). The input is immediately convoluted into detectors for edges, contrast, lines, and primitive shapes, and then gets more and more abstract as the layers go up. So there is no great way of seeing what the computer sees, other than at the final semantic layer, which is described, but not shown in the latest Mobileye talk (this is the front of a car, back of a car, sign type xyz, person, curb, roadway, etc.)

The human visual cortex is the same way. You might think there is a low level brain layer that is a pixel for pixel representation of what the eye picks up, but there isn't really, other than the rods and cones themselves that are light activated.

I suppose that's one way to look at it, that I'm comparing apples to oranges. But it really is a false comparison in both senses because the input layer on the CNN is also a broken down representation of our interpretation. And to talk about the visual representations of the hidden layers is sort of doing the same thing - intermediate values. I guess your argument above is that what the computer sees is what the CNN outputs at the final layer.

Btw, don't forget that Mobileye technologies rely on grayscale cameras only, no color. I don't know if they ever plan on using color information...

I did forget that, thanks for reminding me. I imagine color will eventually enter the mix. ImageNet isn't a greyscale competition, and Google doesn't transpose YouTube videos to greyscale when doing labeling.
 
Early feedback on the Tesla Summon technology is that it will miss narrow objects. This is not an intelligence system, it is what is called an expert system. The main reason for this distinction is that although it will do more with more data, it won't genuinely allow you to create definitions. Let's say in two years' time, crashing drones become a problem on freeways. A learning system would not require programmers to update, but would respond from input and learn. The first time a car sees a drone fall on the freeway, then crashes into it, causing an impact (pretend the cars have impact sensors), it remembers what happened, classifies it as "Above-01," and knows to avoid it next time. Or, more simply, the first time a learning system observes a ball bouncing into a road and avoids the ball, but creams the child chasing after it, it remembers, because of the impact sensors, to come to a full stop instead of simply avoiding the ball.

There are significant ups and downs with this technology. If you want to car-jack someone, you just throw a ball into the road in front, and the car has learned to stop and wait. Easy prey. I know the movies were subject to disdain, but the Star Wars Episode 1-3 with the combat droids was realistic: artificial intelligence is good at only what it is designed to handle, and cannot learn outside that: Jedi are very, very confusing.

I think the Tesla technology, Mobile Eye, and the rest is still an exciting development, but what I haven't seen to date is good stereoscopic shape recognition, and that's basically what human eyes do. As drivers, we rarely process outside of vision and occasionally sound. The DARPA Grand Challenge of 2006 paved the way, but unfortunately, the sensor-heavy technology seems to be the focus of today's engineers, instead of true AI or better machine vision.