So why have a system that processes in-car voice commands on a remote server - is it to leave max processing power for driving & navigation?
In theory, it's a good way of doing it. The local hardware only has to be good enough to record the sound file, probably compress it, and then upload it to the server.
The server can then use some grunt to try and work out what it is you said, and what you meant. In theory, it should be able to do this by comparing what a few other hundred thousand people asked for, and find a good match. The server can then reply to the car with a small amount of data that conveys your intent.
It's the way Alexa works, and its a good way for a system to learn. It can gather lots of data from lots of sources and use this data centrally to compare and analyse similar commands, patterns, requests and errors, etc.
Working in the cloud also means that future functionality is less likely to be limited by the hardware that's in our cars today.
Except, Tesla's current effort is absolutely pants. The voice control in my Mazda, that only had about 10 commands, was rock solid and I used it all the time. I've practically given up using the voice control in the Tesla - which isn't good for the system as a whole, if it is to keep on learning and improving.