You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
Suppose you use the TensorFlow model that Intel demoed early last year (90% accuracy), you would still have a 99.999997% chance of correctly detecting at least one light (on average) within a tenth of a second at 30 fps, assuming an average of 2.5 lights per direction — that is, 1 - ((.1 ^ 2.5) ^ 3).
So unless Intel's algorithm is just way too slow for real-time use, chances are the 98% accuracy is *not* over several seconds.
If we assume instead that 98% is per-frame, per-light, then if they identify 98% of them, assuming an average of 2.5 traffic lights per intersection, you'd have about a 99.994% chance of detecting it correctly in any given frame, and a 99.999999999982% chance of detecting one within a tenth of a second at only 30 fps. At 60 fps, well, Google's calculator can't work with numbers that small, though the odds of failure are one in almost 31 septillion. You are 10 billion times more likely to win a billion dollars or more in both PowerBall AND Mega Millions than for that to fail.
So a 98% detection rate is probably good enough. Not necessarily, but probably.
The most straightforward way to interpret Elon’s comment is that when a dev mode Tesla drives through an intersection, it has a 98% chance of correctly recognizing whether the traffic light governing its lane is red, yellow, or green.
If it were already at 99.99999%+ accuracy per intersection (less than 1 error per 10 million intersections), Elon’s comment wouldn’t make sense. I think he would just say traffic light recognition is solved.
There are publicly available models that have 98% detection per image floating around on GutHub. If they aren't doing at least as well as those, then I'm amazed these cars are even staying on the road.
At only 98% chance per intersection, none of this makes any sense. I mean, let's say that you have a five-second window to detect it. At 30 fps, that's 150 frames. If you have a 2.5% chance of detecting it in a frame, you have a 97.5% chance of not detecting it. So your chance of not detecting it in two frames is 97.5% of 97.5% percent. Your chance of not detecting it in 150 frames, then, is .975 ^ 150, or about 2.2%. So to have only a 98% chance per intersection, you would have to successfully detect a traffic light with only about 2.5% accuracy.
I'm pretty sure I could do significantly better than a 2.5% detection rate per frame with a simple brute force, non-neural-net algorithm that just looks for yellow areas with three areas of red, yellow, green, or black inside them. In a hundred lines of code or less.
No, that has to be per-frame, and probably per light, per-frame. That's the only way such a low number can possibly make sense.
I mean, perhaps it really does have only a 98% chance of correctly detecting the color of the signal light, but if that's true, then maybe it wasn't a good idea using greyscale + red cameras instead of proper full-color cameras.
Also note that with the exception of sideways lights, if you can't tell the color of the light, you're doing something very wrong, because they are always in the same order.
I think yhese probability calculations are assuming the detection events are independent events. Given that each frame is quite similar to the previous (and recognition is highly dependant on location/ surroundings), it is more likely that they are dependent events, thus you cannot just multiple the failure rates together and subtract from one to get the pass rate.
Flipping a coin: independent.
Registering the correct stop light when it:
- Aligns with the light at the next intersection
- Is mounted to the underside of a reflective skywalk
- Is blown half a lane to the side due to winds.
- Is actually a reflection off a window
Are things that don't generally improve just by giving it more frames, you need to train and test the NN for those cases.
Yeah, but those NNs reached 98% when trained on only 600 images.
So really, I can't see how they could only be getting 98% accuracy still, even for single images, much less for entire intersections. That number just seems way too low to be plausible unless they're literally just taking the existing NN with its 600-image training data and using it as-is.
I just don’t see how your interpretation can be consistent with what Elon said. .
Let’s say that 2 seconds before reaching the light, the neural network detects the light and correctly classifies it as red. But then just as it reaches the light, the network either fails to detect the light or misclassifies it as yellow or green. The car will then run the red light. So it’s not enough to classify a red light in a single frame among the hundreds of frames leading up to an intersection. The network has to consistently classify the light correctly.
Do you have a link to it? I recall he said that 98% line on twitter, but I can't seem to find it.
A google search of "Traffic light detection is at 98% elon musk twitter" is giving me nothing, and neither is variations on it.
I would expect the neural net to only need to correctly identify that a traffic light facing the camera exists and provide its approximate bounding box. Everything after that can and probably should be done in procedural code, because that approach would be more testable.
At only 98% chance per intersection, none of this makes any sense. I mean, let's say that you have a five-second window to detect it. At 30 fps, that's 150 frames. If you have a 2.5% chance of detecting it in a frame, you have a 97.5% chance of not detecting it. So your chance of not detecting it in two frames is 97.5% of 97.5% percent. Your chance of not detecting it in 150 frames, then, is .975 ^ 150, or about 2.2%. So to have only a 98% chance per intersection, you would have to successfully detect a traffic light with only about 2.5% accuracy.
First the math is way off. It’s p=1-(1-.975)^150. Also the errors are not random, if you fail to detect it in one frame it is not random if you will detect it in the next frame, the likelihood increases greatly.
What if it correctly identifies it in one frame, and can’t identify in the next 59 frames?
After training a deep neural network on thousands and thousands of these images with arbitrary textures, we found that it actually acquired a shape bias instead of a preference for textures! A cat with elephant skin is now perceived as a cat by this new shape-based network. Moreover, there were a number of emergent benefits. The network suddenly got better than its normally-trained counterpart at both recognizing standard images and at locating objects in images; highlighting how useful human-like, shape-based representations can be. Our most surprising finding, however, was that it learned how to cope with noisy images (in the real world, this could be objects behind a layer of rain or snow) — without ever seeing any of these noise patterns before! Simply by focusing on object shapes instead of easily distorted textures, this shape-based network is the first deep neural network to approach general, human-level noise robustness.
I'm pretty sure you're doing that upside down. If the odds of correctly detecting something in any given frame are the same for each frame, then the odds of correctly detecting it in at least one of two frames must be greater than the odds of finding it in a single frame. With your math, the odds go the opposite direction, towards being less able to detect it in multiple frames than in one.
This video seems to confirm that this is the right approach:Hi,
found this presentation of Andrej Karpathy (Director of AI at Tesla)
He is describing his Work at Tesla und the challenges.
Building the Software 2.0 Stack by Andrej Karpathy from Tesla