Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Karpathy talk today at CVPR 2021

This site may earn commission on affiliate links.
Well, without data to show FSD is better than humans, I think getting regulatory approval will be arduous.

This continues to be an imaginary problem.

They have regulatory approval right now in a bunch of US states.

Anybody with a working L4 or L5 system can turn it on [B}today[/B] under current regulation in those states.

They need not "prove" anything at all about safety, nor wait for "regulators" to check their work.

If Tesla- or anybody- ever tells you THAT is the only thing keeping them at L2, they're lying.
 
oh, if I could only tell some of my war stories. suffice to say, it happens all the time. 'we' clean up for vendors all the time.

I'll just drop this one word here and those who know, will grin.

"watchdog"
Translation: The chip stopped responding, so reboot the whole device and piss off the user.

If sensor fusion is so WORKABLE why do they still not offer L4 service outside one tiny area of a perfect weather, simple road, city?
Spoken like someone who has never driven in Phoenix during monsoon season. :)

It doesn't feel plausible to me that upgrading from 720p cameras to 4K cameras would be a net improvement. The difference between 720p and 4K seems negligible for the purposes of computer vision or even human vision. And unless you just compressed those 4K images down to 720p, those extra pixels would just eat up precious compute.
Well, that's basically 3x the resolution in both directions. So for distant objects, you'd potentially have better speed and distance measurement, better object identification, etc. Whether it is necessary or not is another question.

BTW, the Tesla cameras aren't 720p, strictly speaking. If the compute hardware were fast enough to need 60 fps, they could (with some reduction in coverage) switch to 720p mode (16:9 aspect ratio), where you get fewer pixels at a faster frame rate, but currently, the cameras run at 1280 x 960 at 36 fps (4:3 aspect ratio). That said, the dashcam modes may crop them to 720p. :)
 
  • Informative
Reactions: potatodroid
The obvious circle that surrounds the vehicle that is not from the occlusions by other objects, but rather simply the sensor not covering that area (the 360 sensors have limited vertical FOV so it typically is not used to cover the near field even if it was possible to do so). This circle is present for basically any of these roof mounted sensor solutions.

Some use ultrasonics to cover that blind spot, but it appears Waymo uses other perimeter lidar to cover it.
https://www.eetimes.eu/abandoned-technology-maps-lidar-blind-spot/
Thats not a blind spot. To say thats a blind spot is to say that Tesla's trifocal camera has a blind spot because it can't see to the side or behind.

Its not meant to do that just like the 360 dome lidar isn't meant for perimeter coverage. They have perimeter lidars for that. Their 5th generation lidar is orders of magnitude better than their 4th generation lidar and they are not just replacing the dome lidar but all their lidar.

I literally don't see your point. Lastly no one uses ultrasonic for autonomous driving other than Tesla.
Waymo%2Bcar.JPG


In gen 5, it seems they added cameras to cover all those spots:
iPace-lineart-sensor_calloutv2_03022020-01.png


So basically the opposite of Tesla, which relied a lot on vision from the start, but Waymo is now adding more and more vision to their suite.

No its not the opposite of Tesla. They are not relying more on vision. They already relied on vision equally and arguably more in their 4th gen car. Heck they had more cameras than Tesla had. They added perimeter cameras to where perimeter lidar/radar were already to have full 1:1 redundancy.

Tesla doesn't even have perimeter cameras. Their cameras legit has a blind spot in the front and to the bottom sides.
 
Translation: The chip stopped responding, so reboot the whole device and piss off the user.


Spoken like someone who has never driven in Phoenix during monsoon season. :)


Well, that's basically 3x the resolution in both directions. So for distant objects, you'd potentially have better speed and distance measurement, better object identification, etc. Whether it is necessary or not is another question.

BTW, the Tesla cameras aren't 720p, strictly speaking. If the compute hardware were fast enough to need 60 fps, they could (with some reduction in coverage) switch to 720p mode (16:9 aspect ratio), where you get fewer pixels at a faster frame rate, but currently, the cameras run at 1280 x 960 at 36 fps (4:3 aspect ratio). That said, the dashcam modes may crop them to 720p. :)
There is a trade off of resolution and light gathering. The higher the resolution, the more light you need for the smaller sensor, or you need a bigger more expensive sensor and lens. We tend to get sucked into the bigger numbers are better marketing game, here in the US anyway, but higher frame rates and higher resolutions and not going to be your friend if you are trying to drive at night. On newer iPhones there is a video setting that will automatically drop the frame rate to 24 FPS for video in low light conditions because you can gather more light by exposing longer.

Tesla solved the forward distance vision problem by having 3 cameras at different focal lengths. This would be a bit tricker on the sides because you really don’t know what direction to point the distance camera on the sides, though I could see that happening in the rear, especially for cross traffic backup detection. They also need to find a mount point in the back that isn’t covered in water anytime it rains.
 
Thats not a blind spot. To say thats a blind spot is to say that Tesla's trifocal camera has a blind spot because it can't see to the side or behind.

Its not meant to do that just like the 360 dome lidar isn't meant for perimeter coverage. They have perimeter lidars for that. Their 5th generation lidar is orders of magnitude better than their 4th generation lidar and they are not just replacing the dome lidar but all their lidar.
It's a blind spot that other sensors cover, as mentioned in the article linked.
I literally don't see your point. Lastly no one uses ultrasonic for autonomous driving other than Tesla.
One of the biggest players in autonomous driving (which I see all the time here in SF) uses 10 ultrasonic sensors in their car: Cruise. That's very far from "no one". Zoox's early prototype also uses them, although they no longer provide those details on their latest vehicles (only mention lidar, radar, cameras, and everything else falls under "proprietary").
No its not the opposite of Tesla. They are not relying more on vision. They already relied on vision equally and arguably more in their 4th gen car. Heck they had more cameras than Tesla had. They added perimeter cameras to where perimeter lidar/radar were already to have full 1:1 redundancy.

Tesla doesn't even have perimeter cameras. Their cameras legit has a blind spot in the front and to the bottom sides.
You can see right in the diagrams linked and with a bit of research that Waymo didn't have a matching camera suite in gen 4. They had the 360 camera module on the roof, which is comprised of 8 cameras of the same FOV. However, they didn't have a long range camera and they didn't have any cameras below roof level. That's only something they added in gen 5.
 
I think 10x safer is going to be extremely difficult
Well, seeing as how AP is (arguably) already at 8.6x, is 10x that hard to imagine?

In the 1st quarter, we registered one accident for every 4.19 million miles driven in which drivers had Autopilot engaged. For those driving without Autopilot but with our active safety features, we registered one accident for every 2.05 million miles driven. For those driving without Autopilot and without our active safety features, we registered one accident for every 978 thousand miles driven. By comparison, NHTSA’s most recent data shows that in the United States there is an automobile crash every 484,000 miles.
source
 
Well, seeing as how AP is (arguably) already at 8.6x, is 10x that hard to imagine?

In the 1st quarter, we registered one accident for every 4.19 million miles driven in which drivers had Autopilot engaged. For those driving without Autopilot but with our active safety features, we registered one accident for every 2.05 million miles driven. For those driving without Autopilot and without our active safety features, we registered one accident for every 978 thousand miles driven. By comparison, NHTSA’s most recent data shows that in the United States there is an automobile crash every 484,000 miles.
source
I think the combination of human and machine driving can achieve very high safety. I think achieving that without human oversight will be very difficult. Humans and machines have different strengths.
 
Yeah Teslas AP safety data I think absolutely tells us it's safer using a good ADAS than driving without it....but tells us relatively little at all about how safe any Tesla-designed L3+ system is, and even less about how safe an L5 system would be...

(you could probably make the case it'd be most relevant for say an L3 highway only version of FSD, where more than a few people honestly use it kind of like one anyway... but beyond that you're WAY out in left field thinking those safety numbers inform you of much when you get into wider ODDs or none at all)
 
Yeah Teslas AP safety data I think absolutely tells us it's safer using a good ADAS than driving without it

I am also of the opinion that this is true (using a good ADAS designed for the ODD and combining the strengths of humans and machines is safer than either alone), but I always struggle to conclude from Tesla's data that it is. Which is disappointing, since they probably have the necessary data to show us that it definitely is, but they never publish that data. Arrggh.

I want it to be true, and I think it's true, but I for sure can't quantify how much safer it is, nor can I even see any data which shows that it is safer.
 
fwiw, every single safety critical system has a watchdog.

nothing wrong with it. its standard ASIL-D practice in the auto industry. and it will not be seen by the user if the system catches it and fixes it fast enough.

Yeah, I know. But lots of other minor crap has watchdogs, too, and far too often, watchdogs + buggy drivers + poorly designed hardware = unholiness. :D
 
There is a trade off of resolution and light gathering. The higher the resolution, the more light you need for the smaller sensor, or you need a bigger more expensive sensor and lens. We tend to get sucked into the bigger numbers are better marketing game, here in the US anyway, but higher frame rates and higher resolutions and not going to be your friend if you are trying to drive at night. On newer iPhones there is a video setting that will automatically drop the frame rate to 24 FPS for video in low light conditions because you can gather more light by exposing longer.

Yes and no. Phones use a slower sensor readout in low light to get better SNR by lowering the gain. But that's mostly unrelated to resolution.

Historically, higher-resolution sensors had huge light loss because of the wiring being in front of the photosites, but modern BSI sensors put the wiring behind the photosites, which largely eliminates that, and they typically use microlenses to further reduce light loss. So with modern sensors, the loss from higher resolution is much less. Also, as sensor resolution increases, they make other changes in the designs to further reduce the losses, which narrows the gap further. With binning, the benefits of a lower-resolution sensor become even less significant.

And of course, the per-pixel SNR isn't necessarily the most interesting metric. Having more data, even with higher noise, still gives you more overall data.


Tesla solved the forward distance vision problem by having 3 cameras at different focal lengths. This would be a bit tricker on the sides because you really don’t know what direction to point the distance camera on the sides, though I could see that happening in the rear, especially for cross traffic backup detection. They also need to find a mount point in the back that isn’t covered in water anytime it rains.

There's no parallax on the side, so this is all mostly uninteresting there. Then again, not much interesting happens from the sides, so it doesn't mean much. Where higher resolution would be potentially useful is on the wide field camera, giving better rates of detection and speed/distance estimation for cross traffic, which the other cameras potentially can't see.

I'm not saying it is useful there, mind you, but it might be useful. I doubt it would be particularly useful for any of the other cameras. :)
 
I wonder if replacing the forward looking cameras with higher resolution and equally sensitive would be a reasonably affordable approach to increase the distance FSD can see at speed. Not a limitation today, necessarily, but likely a limitation in the future IMO
Given how many generations behind we're talking about, I think it's safe to say that the SNR of modern 4K chips is better.

Also, the part that they're using is marked as obsolete by the manufacturer. They're going to have to change them out before long, or else the cost of getting new manufacturing runs is going to start creeping up and up.

The real question is whether they can find a 4K part with a global shutter. I'm assuming that's why they're still using this ancient part.
 
Historically, higher-resolution sensors had huge light loss because of the wiring being in front of the photosites, but modern BSI sensors put the wiring behind the photosites, which largely eliminates that, and they typically use microlenses to further reduce light loss. So with modern sensors, the loss from higher resolution is much less. Also, as sensor resolution increases, they make other changes in the designs to further reduce the losses, which narrows the gap further. With binning, the benefits of a lower-resolution sensor become even less significant.
Much of what you say about modern high resolution sensors is true (especially in typical recording conditions), but the last bit is a core issue. You don't want to devote processor cycles to downsampling the output if the resolution ends up unnecessary. The NNs in use today don't tend to require much resolution. The best way to downsample is to oversample, but that is the most processor intensive. The on-chip methods that rely on pixel skipping or line skipping is the least intensive (or no cost), but results in much worse quality (and also noise). Binning is a good compromise but have many of the same moire and AA issues as pixel/line skipping, plus you gain none of the resolution advantages of having a higher res sensor.

That said, you will find that at the extreme ISOs (which I have no doubt Tesla is using given that the cameras still can display an image even when it's almost pitch black to the human eye), even in large sensors (like full frame sensors), there is still an advantage to lower resolution sensors (even comparing to the downsampled image). For example if you compare the Sony A7S III (12MP) to A7R III (42MP) or A7R IV (61MP) or A1 (50MP) you will see once you get past 12800 ISO, the difference in noise between the sensors become very noticeable.
 
  • Informative
Reactions: scottf200
Much of what you say about modern high resolution sensors is true (especially in typical recording conditions), but the last bit is a core issue. You don't want to devote processor cycles to downsampling the output if the resolution ends up unnecessary.

Yeah, but most of the time, that all happens on-die. The camera itself sends out a scaled-down signal at the appropriate resolution. By the time the computer sees it, it is at the lower resolution.


That said, you will find that at the extreme ISOs (which I have no doubt Tesla is using given that the cameras still can display an image even when it's almost pitch black to the human eye), even in large sensors (like full frame sensors), there is still an advantage to lower resolution sensors (even comparing to the downsampled image). For example if you compare the Sony A7S III (12MP) to A7R III (42MP) or A7R IV (61MP) or A1 (50MP) you will see once you get past 12800 ISO, the difference in noise between the sensors become very noticeable.

Only if you're comparing chips of similar generations. Canon's 5D Mark IV looks better at 12,800 than the Mark II at 1600 despite having almost 3x as many pixels.
 
Given how many generations behind we're talking about, I think it's safe to say that the SNR of modern 4K chips is better.
Actually it's not that many generations behind. Image sensor improvements move very slowly (much slower than processor improvements). The current AR0132AT in use by Tesla was first announced in 2012. The successor didn't come until 2016 with the BSI AR0136AT with relatively modest improvements (20% R, 10% G, 3% B, which sounds like a lot, but given you are aware of image sensors, in terms of stops, it's really not much).
Why the new AR0136 image sensor performs better in Automotive and Security than its predecessor

They did introduce the AR0138AT in 2018, which bumped up the sensor size a bit, but max SNR (single frame) actually went down based on below (can't find a detailed datasheet or analysis), so not sure what happened there.
Avnet: Quality Electronic Components & Services

Most of the significant improvements in the different generations is in improving readout speeds (which is another subject, high res sensors tend to have slower sensor readout speeds than lower res sensors of the same generation).

And also, we don't know the lens specs for the cameras Tesla uses or the DOF envelope they are working with. A lot of the modern sensors are achieving higher res and better SNR via increasing sensor size rather than in tech improvements (which are approaching diminishing returns even with BSI). For example, Aptina's AR0820AT 4K sensor (introduced with the AR0138AT) is a 1/2" sensor. Sony's competing 7.42MP sensor (IMX324/IMX424) is even bigger at 1/1.7" (a bump up from the 1/3" IMX224 which is the AR0132AT's analog).
However, given these sensors are used in fixed focus cameras, there is a minimum DOF that is required, meaning for a larger sensor, a lens with a smaller aperture (larger f-stop) would be used, putting you back to square 1 for SNR improvement. On the flip side you can also get the same noise improvements by using a larger aperture lens and the same sensor.
Also, the part that they're using is marked as obsolete by the manufacturer. They're going to have to change them out before long, or else the cost of getting new manufacturing runs is going to start creeping up and up.
Some variants of it is marked as obsolete, but the AR0132AT is very much still an active product.
Image Sensors
Products - ON Semiconductor
It's actually its successor that's marked as completely obsolete (I guess too much overlap with the current AR0138AT):
Products - ON Semiconductor
The real question is whether they can find a 4K part with a global shutter. I'm assuming that's why they're still using this ancient part.
I doubt they will choose a global shutter sensor. I don't think the rolling shutter is really an issue here in this application. Global shutter sensors usually come with significant compromises in terms of SNR (for example the same generation AR0134CS is significantly worse).
 
Last edited:
Yeah, but most of the time, that all happens on-die. The camera itself sends out a scaled-down signal at the appropriate resolution. By the time the computer sees it, it is at the lower resolution.
The on-die methods are either line skipping, pixel skipping, or binning (usually 2x2) with the latter being the best choice, but all of these introduce moire and artifacts due to either missing pixels (in skipping case) or because it has to bin a pixel over to get the right color (as in 2x2 binning), plus you dump away the advantage of the higher res.

Ideally with enough processing power (and data bandwidth) you get to oversample, but that's a lot of data to deal with.

Anyways, these compromises are probably worth it if you can find some usefulness for the higher resolution (like longer range or for cropping certain features, like for example for road sign recognition), but if not, it's a waste of resources.
Only if you're comparing chips of similar generations. Canon's 5D Mark IV looks better at 12,800 than the Mark II at 1600 despite having almost 3x as many pixels.
Canon is not really a good comparison, given Canon has long been far behind in sensor development, and the Mark IV was their "catchup" product, so it resulted in a huge leap in sensor quality relative to their previous cameras, but doesn't follow the general pace of sensor improvements. If you look at other companies with steady improvements, like Sony or Aptina/On semi, you don't find that kind of leap between generations. For example, you can throw in the first gen A7S into my previous comparison and it'll still be cleaner than the latest high res cameras Sony cameras at 12800 ISO (although not as much as A7S III).
 
It's a blind spot that other sensors cover, as mentioned in the article linked.
So Tesla's trifocal camera has blind spots?
One of the biggest players in autonomous driving (which I see all the time here in SF) uses 10 ultrasonic sensors in their car: Cruise. That's very far from "no one".
Cruise doesn't and has never used ultrasonics.

Cruise-AV-Hardware-Systems-Slide.jpg


Zoox's early prototype also uses them, although they no longer provide those details on their latest vehicles (only mention lidar, radar, cameras, and everything else falls under "proprietary").

Zoox has never used ultrasonics either, you need to provide actual source from Zoox because journalist get things wrong ALL THE TIME. They call radar ultrasonic and ultrasonic radar. They also do that for lidar. They also confuse automotive sensors that came with the test cars with the sensors that the AV is using.


You can see right in the diagrams linked and with a bit of research that Waymo didn't have a matching camera suite in gen 4. They had the 360 camera module on the roof, which is comprised of 8 cameras of the same FOV. However, they didn't have a long range camera and they didn't have any cameras below roof level. That's only something they added in gen 5.

Tesla cameras is like webcam grade compared to what Waymo is using. Waymo Gen 4 doesn't use 8 cameras. They use 9 vision modules with multiple camera sensors. All cameras are 8 megapixels except for the long range super HD forward cameras that is like 20 megapixels or something crazy like that.

Waymo Gen 5 has 29 cameras and with the process of elimination we know that Gen 4 had 21-25 cameras.

Tesla on the other hand uses 8x 1.2 megapixels cameras that are already obsolete.

"So, our custom vision system — which allows us to see things like traffic lights and stop signs — is comprised of 8 vision modules each using multiple sensors, plus an additional, forward-facing, super high resolution multi-sensor module, enabling 360-degree vision. With this resolution, we can detect small objects like construction cones far away even when we’re cruising down a road at high speed. And with a wide dynamic range we can see in a dark parking lot, or out in the blazing sun — or any condition in between."
 
  • Informative
Reactions: S4WRXTTCS
Actually it's not that many generations behind. Image sensor improvements move very slowly (much slower than processor improvements). The current AR0132AT in use by Tesla was first announced in 2012. The successor didn't come until 2016 with the BSI AR0136AT with relatively modest improvements (20% R, 10% G, 3% B, which sounds like a lot, but given you are aware of image sensors, in terms of stops, it's really not much).
Why the new AR0136 image sensor performs better in Automotive and Security than its predecessor

I'm not saying that Aptina/ON Semiconductor's sensors have improved dramatically in nine years (though they probably have). I have no idea what that particular company's chips have done quality-wise. But the BSI concept that they introduced has been broadly deployed by everyone in the years since, and lots of companies have really pushed the envelope with other technologies that improve image quality.

In those nine years, Sony, Samsung, etc. have gone through several major jumps in image quality. ISOCELL in particular was a huge win (30% dynamic range improvement in the first generation alone), because it dramatically reduces pixel bleed that historically contributed to washed out images in a lens flare situation. In self-driving cars, that's probably a much bigger win than in cell phones. And antiglare coatings have probably also improved in that time.


They did introduce the AR0138AT in 2018, which bumped up the sensor size a bit, but max SNR (single frame) actually went down based on below (can't find a detailed datasheet or analysis), so not sure what happened there.
Avnet: Quality Electronic Components & Services

Same color pattern? Same bit depth? Same ability to control gain on separate color channels (with split green)? Yeah, I can't find any details, either. No idea. Probably a different measurement approach.

And also, we don't know the lens specs for the cameras Tesla uses or the DOF envelope they are working with. A lot of the modern sensors are achieving higher res and better SNR via increasing sensor size rather than in tech improvements (which are approaching diminishing returns even with BSI). For example, Aptina's AR0820AT 4K sensor (introduced with the AR0138AT) is a 1/2" sensor. Sony's competing 7.42MP sensor (IMX324/IMX424) is even bigger at 1/1.7" (a bump up from the 1/3" IMX224 which is the AR0132AT's analog).

That may be true for Aptina/ON Semiconductor, but I don't think they're at the forefront of image sensor technology anymore.

The amount of money that companies can devote to R&D tends to be proportional to the number of chips that they sell, and I don't think they're even in the top 5.

In terms of smartphone chips:

Sony: 46%
Samsung: 29%
Omnivision: 10%
All other companies combined: 15%

And LG makes all of the recent iPhone sensors, which should mean that they're most of that 15%.

So On Semiconductor probably builds only a fraction of a percent of the smartphone chips out there. Even if they made every chip for every semi-autonomous vehicle out there, they'd still be a minor player. And if nobody is buying their chips, the only way they can improve is by building bigger chips, because they're sure not going to be able to move to newer process technology and take advantage of the benefits thereof (bigger full well size because of smaller wiring, etc.).



However, given these sensors are used in fixed focus cameras, there is a minimum DOF that is required, meaning for a larger sensor, a lens with a smaller aperture (larger f-stop) would be used, putting you back to square 1 for SNR improvement. On the flip side you can also get the same noise improvements by using a larger aperture lens and the same sensor.

Agreed. Moving to a significantly larger sensor is a non-starter, both for depth of field reasons and for physical dimension reasons. But there have been lots of other process improvements in the last decade that should result in better sensors at similar resolution or similar quality sensors at higher resolution.


I doubt they will choose a global shutter sensor. I don't think the rolling shutter is really an issue here in this application. Global shutter sensors usually come with significant compromises in terms of SNR (for example the same generation AR0134CS is significantly worse).

Ah. I was looking at the wrong data sheet, and thought they were already using a global shutter camera. With that bit of info, there should be a lot of better options out there. :)