FSD v12.x (end to end AI)

sleepydoc · Dec 24, 2023

spacecoin said:
If you feel it’s ready, why not try it with a blindfold and see if it still feels ready?

I love my AP but I also understand that I am driving.

I more or less do. Last time I drove from Minneapolis to Grad Forks I did the entire thing on FSD and didn’t have to do a thing apart from parking to charge. Remember that Level 3 is not level 4 or 5. You’re still required to be in the driver’s seat ready to take over if notified, you’re just allowed to do other things.

mongo · Dec 24, 2023

sleepydoc said:
Edit: my other minor gripe is it doesn’t reliably merge back over to the right lane after passing unless there’s a car on your tail. That should be a trivial fix for them to make, though.

I thought that was an option setting?

sleepydoc · Dec 24, 2023

mongo said:
I thought that was an option setting?

There’s a ‘minimal lane changes‘ option but the issue isn’t lane changes, rather how it handles them. If I come up behind someone driving 5 MPH slower then I want to pass them, it should be courteous (and legal In some states) and get back over in the right lane.

eli_ · Dec 24, 2023

diplomat33 said:
Anyone notice that the release notes for V12 say that it upgrades "city streets driving"? This seems to imply to me that the new end-to-end stack is only for city streets right now and therefore V12 still uses the "V11 stack" for highway driving.

This was my prediction because the reliability bar for highway driving is much higher and folks will be less tolerant of regressions. Automated highway driving is a basic feature of the cars and city driving is the new domain. Also they rolled the previous FSD stack out in the same way.

spacecoin · Dec 24, 2023

sleepydoc said:
I more or less do. Last time I drove from Minneapolis to Grad Forks I did the entire thing on FSD and didn’t have to do a thing apart from parking to charge. Remember that Level 3 is not level 4 or 5. You’re still required to be in the driver’s seat ready to take over if notified, you’re just allowed to do other things.

So blindfold next drive then? L3 is take over when notified within ca 10 secs. The car is legally driving during the handover in Level 3, fyi.

mongo · Dec 24, 2023

sleepydoc said:
There’s a ‘minimal lane changes‘ option but the issue isn’t lane changes, rather how it handles them. If I come up behind someone driving 5 MPH slower then I want to pass them, it should be courteous (and legal In some states) and get back over in the right lane.

Ah, they removed the "Exit Passing Lane" option/ functionality on/around 9 months ago/ version 2023.12.11 .

Nakk · Dec 24, 2023

spacecoin said:
I love my AP but I also understand that I am driving.

Agreed, but judging from all the screaming complaints on Facebook about AP now monitoring eye position, you are in a small minority. Funny, but scary.

Todd Burch · Dec 24, 2023

gsmith123 said:
Is this a broadly agreed on community consensus?

I really thought they were trying to end-to-end one single neural network where it takes in video (and other sensor inputs) and spits out a path and velocity plan.

A previous post makes the point the if they're reusing the existing perception layer they could be building V12 more incrementally.

Another downside I see is that separate layers for perception and planning implies a connection between them in the middle (think "public API" in computer engineering terminology). I think people have used the term "Vector Space" for the output from the perception stack (locations of vehicles, locations of VRUs, lane lines, driveable space, signs, signals, metadata about these objects, etc).

The downside I see is these are all human picked and curated concepts. By building planning on top of these the planner can only know about the concepts that human decided to build into the perception layer. Combining all of it into one neural net, end to end, eliminates the need to even pick what stuff is represented in the intermediary "Vector Space", or how it's represented.

This is a good question, and one that I don't know we'll have the full answers to until the next autonomy day presentation. Verbiage might be "video in, controls out" but you could interpret that either as a single massive NN that does everything, or as a set of modularized NNs each with their own task, that then feeds the planning NN.

I am no machine learning expert. Both are probably technically possible, but my hunch is that sub-networks are the inputs for the planning network.

When Elon says "video in, controls out" I don't think he necessarily means "a single neural network". Just that there isn't C++ code doing any of the decision making or perception.

willow_hiller · Dec 24, 2023

spacecoin said:
So blindfold next drive then? L3 is take over when notified within ca 10 secs. The car is legally driving during the handover in Level 3, fyi.

The current lack of a handover procedure doesn't preclude a system from operating with enough reliability to be L3 if it had a handover procedure.

So what you're suggesting doesn't prove anything. FSD Beta currently has no mechanism to alert a driver that they need to take over soon if they're not already paying attention. But we can speak from experience: the system is very reliable on the highway, and capable of identifying situations that might lower that reliability (emergency vehicles, construction, etc).

spacecoin · Dec 24, 2023

willow_hiller said:
The current lack of a handover procedure doesn't preclude a system from operating with enough reliability to be L3 if it had a handover procedure.

So what you're suggesting doesn't prove anything. FSD Beta currently has no mechanism to alert a driver that they need to take over soon if they're not already paying attention. But we can speak from experience: the system is very reliable on the highway, and capable of identifying situations that might lower that reliability (emergency vehicles, construction, etc).

So let's do the blindfold challenge? The one that survives wins?

Seriously though, the car isn't ready. Most people seem to have no idea what reliability is required to remove the driver from the OEDR and maintain the same level of safety. A 100x from current highway performance is not enough. If you've had a single critical DE once the last 6-12 months, that's means it's not ready. I am going on 20 years without an accident (except a minor parking thing).

willow_hiller · Dec 24, 2023

spacecoin said:
So let's to the blindfold challenge? The one that survives wins?

Seriously though, the car isn't ready.

You didn't read what I wrote. Blindfolding the driver on FSD Beta is as good as removing the driver entirely, because there is no mechanism to alert a driver that is not already paying attention.

So what you're suggesting wouldn't prove that the car is ready for L3, it would prove that the car is ready for L4.

spacecoin · Dec 24, 2023

willow_hiller said:
You didn't read what I wrote. Blindfolding the driver on FSD Beta is as good as removing the driver entirely, because there is no mechanism to alert a driver that is not already paying attention.

So what you're suggesting wouldn't prove that the car is ready for L3, it would prove that the car is ready for L4.

As I wrote in the edit, it's not ready. Not by a longshot. When your car is at a few thousand hours on AP without a single disengagement of any kind for any reason, then we're getting to L3 territory.

1. How many hours of driving since your last DE?
2. Reset the clock.
3. Let me know when it gets to 100, 500, 1000h?

diplomat33 · Dec 24, 2023

gsmith123 said:
Is this a broadly agreed on community consensus?

I really thought they were trying to end-to-end one single neural network where it takes in video (and other sensor inputs) and spits out a path and velocity plan.

A previous post makes the point the if they're reusing the existing perception layer they could be building V12 more incrementally.

Another downside I see is that separate layers for perception and planning implies a connection between them in the middle (think "public API" in computer engineering terminology). I think people have used the term "Vector Space" for the output from the perception stack (locations of vehicles, locations of VRUs, lane lines, driveable space, signs, signals, metadata about these objects, etc).

The downside I see is these are all human picked and curated concepts. By building planning on top of these the planner can only know about the concepts that human decided to build into the perception layer. Combining all of it into one neural net, end to end, eliminates the need to even pick what stuff is represented in the intermediary "Vector Space", or how it's represented.

I don't think there is a community consensus. Some are arguing that V12 simply replaced the 300k lines of planning code with a NN but kept the V11 perception NN the same. So V12 is neural nets from start to finish but still made up of separate NN. Others are arguing that V12 is a complete rewrite from scratch, true end-to-end, meaning Tesla replaced the entire previous stack with a single NN that outputs driving controls directly from video input. The fact that both Elon and Ashok have described V12 as vision in, controls out, and announced Tesla is building a generative AI model that can train directly on video, would certainly seem to imply that V12 is true end-to-end, one single NN, video in, controls out. Personally, I take Elon and Ashok's words as plainly as can be. They seem to be describing true end-to-end so I assume that is what they mean.

But I think some folks don't believe that it is possible. Maybe they don't believe end-to-end is real, maybe they don't believe Tesla would give up on the previous stack that they spent years putting together, or maybe they don't believe Tesla had time to really do true end-to-end. Regardless, they are skeptical that Tesla is doing true end-to-end, so they assume V12 has to still use previous NN and just stacking different NN together.

I think it also depends on how you define end-to-end. Some define the term in a very broad sense of just meaning NN from start to finish but it can be several NN stacked together, while others define the term in a more narrow sense of meaning it has to be one single NN that outputs driving controls directly from video input.

willow_hiller · Dec 24, 2023

spacecoin said:
As I wrote in the edit, it's not ready. By a longshot. When your car is at a few thousand hours on AP without a single disengagement of any kind for any reason, then we're getting to L3 territory.

What are you even talking about? Are you saying that Mercedes isn't reliable enough for L3 because it cannot handle emergency vehicles? Because I'm pretty sure that's "any reason." Why are you holding Tesla to a higher standard for L3 than Mercedes?

There exists a reasonable ODD for which FSD Beta is reliable for thousands and thousands of hours. It could be a geofence, it could be excluding construction. We just don't have the data. Those of us that actually use FSD Beta can only speak from personal experience about our local areas.

I kind of already use FSD Beta with a "no construction" ODD. I disengage around the few construction zones that I know it doesn't reliably work for my daily commute, and otherwise I'm disengagement free. If Tesla implemented a handover procedure for my car that gave me notification of those zones approaching, I could confidently take my attention away from the driving task. But until that handover procedure actually exists, it's not possible or responsible to blindfold myself.

mongo · Dec 24, 2023

diplomat33 said:
Some are arguing that V12 simply replaced the 300k lines of planning code with a NN but kept the V11 perception NN the same.

That was what the original demo to Elon was.

diplomat33 said:
Others are arguing that V12 is a complete rewrite from scratch, true end-to-end, meaning Tesla replaced the entire previous stack with a single NN that outputs driving controls directly from video input.

IIRC, Karpathy posted that the improvement per training cycle was too low for that to work from scratch.

diplomat33 said:
Regardless, they are skeptical that Tesla is doing true end-to-end, so they assume V12 has to still use previous NN and just stacking different NN together.

diplomat33 said:
Some define the term in a very broad sense of just meaning NN from start to finish but it can be several NN stacked together, while others define the term in a more narrow sense of meaning it has to be one single NN that outputs driving controls directly from video input.

Taking the existing NNs, unlocking all the parameters and expanding the original label layer vector sizes might achieve a single trainable end to end NN.

spacecoin · Dec 24, 2023

willow_hiller said:
What are you even talking about? Are you saying that Mercedes isn't reliable enough for L3 because it cannot handle emergency vehicles? Because I'm pretty sure that's "any reason." Why are you holding Tesla to a higher standard for L3 than Mercedes?

Of course I mean a 1000 hour+ MTBF within the ODD. As Tesla doesn't have an ODD and you haven't defined one, let's assume daytime, highway speed, dry roads?

I know for a fact that I have disabled AP for security reasons several times the last year even in perfect conditions, and I also know that I sure as hell wouldn't sit and chat facing the kids in the back seat for a few minutes in that ODD. Because the system isn't reliable enough.

willow_hiller said:
There exists a reasonable ODD for which FSD Beta is reliable for thousands and thousands of hours. It could be a geofence, it could be excluding construction. We just don't have the data. Those of us that actually use FSD Beta can only speak from personal experience about our local areas.

If such ODD exists, please suggest what you think it is. See below.

willow_hiller said:
I kind of already use FSD Beta with a "no construction" ODD. I disengage around the few construction zones that I know it doesn't work for my daily commute, and otherwise I'm disengagement free. If Tesla implemented a handover procedure for my car that gave me notification of those zones approaching, I could confidently take my attention away from the driving task. But until that handover procedure actually exists, it's not possible or responsible to blindfold myself.

That's both good and resonable. But the reason the handover procedure would be needed is when the car leaves the ODD. Are you claiming you'd have zero DE's in any ODD that you have a 100+ hours driving in?

The typical TT-DE is <100 miles (highway) looking at the FSDb tracker. That's less than two hours of driving on the highway. About 500x less than the 1000 hours I was after.

sleepydoc · Dec 24, 2023

spacecoin said:
So blindfold next drive then? L3 is take over when notified within ca 10 secs. The car is legally driving during the handover in Level 3, fyi.

Sigh. Yet another commenter who deals only in hyperbole and absolutes. Since you are either unwilling or unable to defend your position without resorting to strawman arguments, absolutes or other fabricated points there’s no point in trying to have a rational discussion and I won’t continue to do so.

spacecoin · Dec 24, 2023

sleepydoc said:
Sigh. Yet another commenter who deals only in hyperbole and absolutes. Since you are either unwilling or unable to defend your position without resorting to strawman arguments, absolutes or other fabricated points there’s no point in trying to have a rational discussion and I won’t continue to do so.

Humor me? Unless you think the system by itself is safer than a human, it's not ready, yeah?

willow_hiller · Dec 24, 2023

spacecoin said:
I know for a fact that I have disabled AP for security reasons several times the last year even in perfect conditions, and I also know that I sure as hell wouldn't sit and chat facing the kids in the back seat for a few minutes in that ODD. Because the system isn't reliable enough.

FSD Beta performance on the highway far exceeds AP, even in bad weather conditions. AP is not a good point of comparison.

spacecoin said:
That's both good and resonable. But the reason the handover procedure would be needed is when the car leaves the ODD. Are you claiming you'd have zero DE's in any ODD that you have a 100+ hours driving in?

The typical TT-DE is <100 miles (highway) looking at the FSDb tracker. That's less than two hours of driving on the highway. About 500x less than the 1000 hours I was after.

The tracker data is unfortunately not reliable as a baseline either. People subjectively disengage for any number of reasons. There are people on this forum that regularly disengage FSD Beta for having a different driving style to them. Even when they classify it as a safety disengagement, it's their subjective assessment of what counts as safety. Some people consider passing a crosswalk that has a pedestrian on the other side of it as worthy of a safety disengagement. I do as well, but the engineers at Tesla obviously don't because they made it a release-note feature to pass crosswalks when the car judges it can without intersecting the path of the pedestrian.

Until someone starts driving FSD Beta with a strict set of rules under which they will disengage and tracking that data, I don't see how we can quantify Tesla's disengagement rates in a way that can compare to other AV companies. Tesla probably has this data, but we do not.

spacecoin · Dec 24, 2023

willow_hiller said:
The tracker data is unfortunately not reliable as a baseline either.

Are you claiming you have zero DE's in some ODD that you have a 100+ hours driving in? What ODD is that?

FSD v12.x (end to end AI)

Well-Known Member

Well-Known Member

Well-Known Member

Member

Active Member

Well-Known Member

Member

14-Year Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Average guy who loves autonomous vehicles

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Similar threads