Frozen Pretrained Transformers as Universal Computation Engines
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks. We consider such a model, which we call a Frozen Pretrained Transformer (FPT), and study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction. In contrast to prior works which investigate finetuning on the same modality as the pretraining dataset, we show that pretraining on natural language can improve performance and compute efficiency on non-language downstream tasks. Additionally, we perform an analysis of the architecture, comparing the performance of a random initialized transformer to a random LSTM. Combining the two insights, we find language-pretrained transformers can obtain strong performance on a variety of non-language tasks.
Top-30
Journals
|
1
2
3
4
5
6
|
|
|
Lecture Notes in Computer Science
6 publications, 8.96%
|
|
|
Briefings in Bioinformatics
2 publications, 2.99%
|
|
|
IEEE Transactions on Multimedia
2 publications, 2.99%
|
|
|
IEEE Transactions on Intelligent Transportation Systems
2 publications, 2.99%
|
|
|
Frontiers of Information Technology and Electronic Engineering
2 publications, 2.99%
|
|
|
Information (Switzerland)
1 publication, 1.49%
|
|
|
Computers
1 publication, 1.49%
|
|
|
Nature Machine Intelligence
1 publication, 1.49%
|
|
|
Lecture Notes in Networks and Systems
1 publication, 1.49%
|
|
|
Autonomous Robots
1 publication, 1.49%
|
|
|
IEEE Transactions on Pattern Analysis and Machine Intelligence
1 publication, 1.49%
|
|
|
IEEE/ACM Transactions on Computational Biology and Bioinformatics
1 publication, 1.49%
|
|
|
Computers and Chemical Engineering
1 publication, 1.49%
|
|
|
IEEE Signal Processing Letters
1 publication, 1.49%
|
|
|
Reliability Engineering and System Safety
1 publication, 1.49%
|
|
|
IEEE Transactions on Information Forensics and Security
1 publication, 1.49%
|
|
|
Applied Sciences (Switzerland)
1 publication, 1.49%
|
|
|
Neural Computing and Applications
1 publication, 1.49%
|
|
|
bioRxiv
1 publication, 1.49%
|
|
|
BioData Mining
1 publication, 1.49%
|
|
|
Technologies
1 publication, 1.49%
|
|
|
Neurocomputing
1 publication, 1.49%
|
|
|
Scientific Reports
1 publication, 1.49%
|
|
|
Energy Conversion and Management
1 publication, 1.49%
|
|
|
Geophysical Research Letters
1 publication, 1.49%
|
|
|
IEEE Transactions on Geoscience and Remote Sensing
1 publication, 1.49%
|
|
|
PLoS Computational Biology
1 publication, 1.49%
|
|
|
Information Fusion
1 publication, 1.49%
|
|
|
IEEE Transactions on Knowledge and Data Engineering
1 publication, 1.49%
|
|
|
1
2
3
4
5
6
|
Publishers
|
5
10
15
20
25
30
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
26 publications, 38.81%
|
|
|
Springer Nature
15 publications, 22.39%
|
|
|
Association for Computing Machinery (ACM)
7 publications, 10.45%
|
|
|
MDPI
5 publications, 7.46%
|
|
|
Elsevier
5 publications, 7.46%
|
|
|
Oxford University Press
2 publications, 2.99%
|
|
|
Cold Spring Harbor Laboratory
2 publications, 2.99%
|
|
|
Wiley
1 publication, 1.49%
|
|
|
Public Library of Science (PLoS)
1 publication, 1.49%
|
|
|
IntechOpen
1 publication, 1.49%
|
|
|
5
10
15
20
25
30
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.