Imo Tesla will go deep into LLMs. I was thinking about the latest
GPT4 paper. At page 9 there is this image:
View attachment 926647
Then it hit me that this is not too far from what Optimus will be doing. Input the images -> compress them into some vector and use as input for the model.
Replace the user prompt with
User: Grab me that can of coke, open it and pour it into a glass
Replace the GPT4 output with:
At XYZ1 there is a can of coke that the user is pointing at
XYZ2 there is the shelf with glasses
Execute list of tasks:
1. Move closer to shelf
2. Select glass
3. Grab glass
...
That's it.
And also in order to interact with the user they need a good speech recognition. OpenAI has released Whisper which runs pretty fast:
We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.
openai.com
You don't need crazy offline compute for this, you can run some of these LLMs on modern laptops:
lawrencecchen.com
So what does this mean? I think Tesla has made upgrades to Dojo to better handle LLMs in the future:
And they will be needing these to train their models. Both their massive offline models and their destilled online models. And process so many user interactions and quickly iterate on these huge billions/trillions parameter models. And I think HW5 will have some more LLM specific architecture, not just vision batch size of 1.
Imo Tesla needs to get onto this soon or OpenAI will do with them what they did to Google with ChatGPT:
Ilya commented on this a while ago:
Basically it's all about being willing to bet big and get to scale. Before they were not ready and digital was easier, but now they are getting ready. Elon understands this and is crazy enough to try. And as Elon said, if anyone should get to AGI it's probably best if it's Tesla as they are a public company and he doesn't trust the rest: