AdaTape is characterized by elastic sequence lengths generated by the adaptive tape reading mechanism. This also introduces a new inductive bias that enables AdaTape to have the potential to solve tasks that are challenging for both standard transformers and existing adaptive transformers. By conducting comprehensive experiments on image recognition benchmarks, we demonstrate that AdaTape outperforms standard transformers and adaptive architecture transformers when computation is held constant.
AdaTape: Foundation model with adaptive computation and dynamic read-and-write
Posted by Fuzhao Xue, Research Intern, and Mostafa Dehghani, Research Scientist, Google Adaptive computation refers to the ability of a machine lea...
ai.googleblog.com