I think its worth breaking the FSD challenge into 3 bits:
- Getting data about the surroundings
- Parsing that data into a 4d representation of your surroundings
- Deciding how to navigate, given this 4d representation
As a programmer, I reckon 3) is WAY WAY easier than the other 2. This is stuff you can do in plain old-fashioned C++. You probably use a ton of fuzzy logic, and weights, and frankly using a neural net is likely very helpful here too, but its really not that hard, given the insane clockspeed of CPUs, and the relative slowness of the surrounding world.
1) is a hardware issue. Still a big unknown. The difficulty of 2) will depend slightly on 1), but adding cameras gets complex, and expensive very fast. Not just cost of cameras + wiring, but the extra CPU processing of more data...
2) is the meat-and-potatoes. This is the hard stuff. Object recognition in 3D is hard. Object recognition in fog, and rain, with light bouncing off nearby surfaces, water droplet distortion on the lens, mud on another lens... is seriously hard.
TL;DR: FSD is mostly an object recognition challenge. Logic/Planning is flipping trivial. Humans are amazingly good at object recognition, hence it seems easy to us. We suck at maths though.