Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
The down res might not work because they are feeding the neural net the raw photon count, not a processed image.
I wonder what ‘raw photon count’ actually means.
To my knowledge, there are no (mainstream) image sensors literally doing photon counts.
My guestimate is that it means that the training happens on the output of the sensor, i.e. on the raw sensor data, before de-mosaicing (see Bayer filter - Wikipedia for an illustration). While typically image recognition neural networks are trained on a demosaiced RGB image (i.e. on tensors of dimension (w,h,3), where the 3 dimension consists of the demosaiced RGB data), a performance optimised neural network could be directly trained on the raw data, i.e on tensors of dimension (w,h,4), where the 4 dimension consists of the raw RGRB data (see sensor image in the bayer filter link), skipping the demosaicing algorithm (It’s only needed to get pretty pictures for human consumption).
Since it’s video, it also eliminates the video compression and decompression (image recognition works on the uncompressed frame buffer).
The video compression is only needed to have lower bandwidth requirements, but thats not relevant if you have a fixed wired network with enough bandwidth, like in the wiring of a car.
By eliminating the video compression and decompression, you also eliminate the latency involved in video compression (you don’t need to wait until you have a couple of subsequent frames to do the temporal compression).
So yes it makes sense to train on the raw sensor output. That doesn’t mean it’s impossible to scale the raw sensor output to a lower resolution for compatibility with HW3.
 
If they are not de-facto counting photons (inside some short interval) ... what are they doing then?
Counting is an inherently digital process. There may be experimental sensors that actually count photons, but the mainstream cells are actually analog cells that convert a charge or voltage via an AD-converter to a number. I don’t consider that ‘counting’ (but in both cases you end up with a number).
The whole point may be moot, I found the original tweet and it’s not about counting photons but about using a 16-bit raw (or whatever the output of the AD converters is on their hardware, probably 12 or 14 bits) values instead of the 8-bit values after jpg decoding for training their networks: https://x.com/yuntatsai1/status/1695639492122792240?s=12
 
Seriously...that's actually quite interesting. Right now we are in the early stages of transition...mostly human drivers with a few robots, so the robots need to mimic the best humans. Over time, if eventually 100% of cars on the road are self driving, then driving patterns and rules should change. 100% robot drivers acting like perfect human drivers is good (and better than today), but not optimized. Robotaxis with surround cameras and high speed computing will have better awareness and reaction times, and be capable of much more: higher speeds, closer spacing, tight merging, passing through narrow gaps in cross traffic, etc. But how can we flip the switch to get there unless all the cars (at least in a given area) upgrade at the same time and/or know and trust each other's capabilities...

View attachment 968790

Related to Tesla and TSLA specifically, I think this means that for a truly great future, 100% of those future 100% robot cars need to be running Tesla's FSD.

....Bullish!
Yup. Once human examples are no longer good enough superhuman attributes could be incorporated into the trainong via AI generated video in simulation.
 
  • Like
Reactions: Thumper
Yup. Once human examples are no longer good enough superhuman attributes could be incorporated into the trainong via AI generated video in simulation.

Once human input is not available anymore AI will search for the ways how to better drive the same roads.
Imagine it starts using light blinking to signal other cars it's intentions and requests - To a parked car: please make space for me to park behind you. Etc.
 
  • Like
Reactions: ABCTG
For context, the current attach rate for FSD is very low, so the news that FSD is not yet supported on HW4 equipped cars is going to matter to very few people Buying vehicles.
True, but it matters a lot to the thousands of owners who have lost FSD during the FSD transfer to a new Tesla.
 
And the number of CTs being sold between now and spring 2024 is going to be a small percentage of tesla sales (well under 100k) - and I’m doubting many early CT customers are going to turn down a CT due to FSDb not being available right away on it, and instead go to the back of the queue and wait a couple of years for the next delivery window.
I doubt this because the lower FSD price is locked in, so a substantial savings for a bit of a wait. $7,000 (for early reservations, vs $15,000)
 
True, but it matters a lot to the thousands of owners who have lost FSD during the FSD transfer to a new Tesla.
Sure, but the HW4 car buyers should not be prioritized over HW3 vehicle owners without ultrasonic sensors (like me) that were promised all features from enhanced autopilot (autopark/summon/smart summon) since October 1st 2022, coming "soon" when Tesla Vision achieves parity with USS-equipped cars.

HW4 buyers will have FSD, just not the latest and greatest.

European HW3 no-USS FSD buyers have nothing after almost a year. The only added value from my FSD purchase (which I bought to support the company as an investor) right now is traffic light recognition. Meanwhile WholeMars is doing handsfree city-driving. :rolleyes:

TL;DR: no line cutting, please :)
 
Once human input is not available anymore AI will search for the ways how to better drive the same roads.
Perhaps, but why would it care? It could become that one of the safest places for humans is in vehicles so extending time in vehicles via delays enhances human safety.

V11 has hooks for human input for optimization. It seems V12 sacrifices such human built code. Short term win at long term optimization expense perhaps.

It could be that EM is thinking more about AGI than a human centric profitable FSD. We (Tesla investors) should be given some sort of progress metric at some point. Right now things are looking like a billion dollar science experiment without any goal or method of assessing success.

It sort of seems the entire justification for Dojo (auto labeling) is gone. Rejiggering purpose-built Dojo for a new task is going to take time.

It would not be too cynical to observe that X-AI is really the beneficiary of V12 experiments. We need a clear eyed assessment of V11 vs V12 on some basis. V11 is currently superior IMO.

V12 seems an enormous open ended risk lacking any way to assess progress along any predictable timeline. All we are given is that more data is a singular tool required for progress. “More data“ is not a useful metric.

I can understand the market value of a vehicle with a million mile drive unit. I cannot assess whether V12 will match V11 in 10 months or 10 years? I don’t think anyone else can either so risk/uncertainty may have gone up for investors.

I have to wonder if V12 is not a negative for the stock value? I guess we will see how the market reacts but my expectation is not hopeful until we have less hand waving and some metrics based plan to help investors make sense of decision points going forward.

One possibility would be something like perfecting Summons. If the V12 approach can perfect (not improve but perfect) summons in all kinds of weather and conditions then that could be a confirmation that there is clear market value in the V12 approach.

It might be that the V12 approach is great at solving vision but elusive in reaching high levels of consistency without decades of driving data. Comments?
 
It might be that the V12 approach is great at solving vision but elusive in reaching high levels of consistency without decades of driving data. Comments?
My personal opinion is that many set too high of a bar for the definition of a "successful" V12 stack.

When reading the FSD dedicated threads the bearish reply to the progress of E2E V12 is "it won't work everywhere in all situations", i.e. sometimes a crash might occur.

However, the first bar to be met is driving "as good as a human". Humans make mistakes. The AI will make mistakes. Think GPT3: it makes mistakes but shows great potential.

If the AI gets confused whilst driving in a blizzard, I don't see how this is different than a human not being able to drive through a snowstorm.

So yeah, I think V12 shows great potential, and just like LLM's it will increase leaps and bounds if the model becomes greater (more nodes) and is trained with more/better data.

The march of nines has begun.
 
The AI will make mistakes.
I think it would be reasonable to have a design requirement that FSD not cause a crash. It might be ok to be involved in some crashes where FSD is not at fault.

Training FSD on driving behavior that includes examples of deeply flawed human examples seems problematic. This seems the root of the “stop sign” problem. Training data teaches FSD to adopt bad behavior.
 
I think it would be reasonable to have a design requirement that FSD not cause a crash. It might be ok to be involved in some crashes where FSD is not at fault.

Training FSD on driving behavior that includes examples of deeply flawed human examples seems problematic. This seems the root of the “stop sign” problem. Training data teaches FSD to adopt bad behavior.
Maybe it would be better if the car refused to drive in such conditions. Humans insist on doing things in conditions they should not be doing them.
 
Sure, but the HW4 car buyers should not be prioritized over HW3 vehicle owners without ultrasonic sensors (like me) that were promised all features from enhanced autopilot (autopark/summon/smart summon) since October 1st 2022, coming "soon" when Tesla Vision achieves parity with USS-equipped cars.

HW4 buyers will have FSD, just not the latest and greatest.

European HW3 no-USS FSD buyers have nothing after almost a year. The only added value from my FSD purchase (which I bought to support the company as an investor) right now is traffic light recognition. Meanwhile WholeMars is doing handsfree city-driving. :rolleyes:

TL;DR: no line cutting, please :)
They'll be running shadow mode for a feedback loop in their training. Makes sense to go with the HW that has the most units in the wild.