Thats not what they said.
"Yeah. In terms of scaling loss, people in the community generally talk about model scaling loss where they increase the model size a lot and then their corresponding gains in performance, but we have also figured out scaling loss and other access in addition to the model side scaling, making also data scaling. You can increase the amount of data you use to train the neural network and that also gives similar gains and you can also scale up by training compute, you can train it for much longer and one more GPUs or more dojo nodes that also gives better performance, and you can also have architecture scaling where you count with better architectures for the same amount of compute produce better results. So, a combination of model size scaling, data scaling, training compute scaling and the architecture scaling, we can basically
extrapolate, OK, with the continue scaling based at this ratio, we can perfect big future performance."
In other words, they think based on how much they can improve the model by putting more data - they can extrapolate.
ER Call Transcript