Memory transformer with hierarchical attention for long document processing

Publication typeProceedings Article
Publication date2021-11-24
Abstract
Transformers have attracted lots of interest from the researchers. Up to now, transformers achieved state-of-the-art results in a wide range of natural language processing tasks such as different sequence modeling tasks like language understanding, text summarization and translation, and definitely more transformers to come. Still, transformers has their limitations in the tasks requiring long document processing. This paper introduces a new version of transformer, a Sentence level transformer with global memory pooling and hierarchical attention to cope with long text. We replace self-attention of vanilla transformer with multi-head attention between memory and a sequence, and also add a decoder sequence selector on the top of the encoder output. In our architecture sentences are encoded in parallel and then summarized with soft-attention on every decoding step. Proposed model was validated in machine translation task. We hypothesize that attaching memory slots to each sequence improves the quality of translation, besides tuning the model on context-aware data set by using pre-trained sequence-level weights will help to get more precise translation and promote translating long documents. Results show that extending each sentence with a memory slot and employing the attention over the encoder outputs improves translation results.
Found 
Found 

Top-30

Journals

1
Information (Switzerland)
1 publication, 16.67%
Knowledge-Based Systems
1 publication, 16.67%
Studies in Computational Intelligence
1 publication, 16.67%
Journal of Supercomputing
1 publication, 16.67%
1

Publishers

1
2
Springer Nature
2 publications, 33.33%
MDPI
1 publication, 16.67%
Elsevier
1 publication, 16.67%
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 16.67%
1
2
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
6
Share
Cite this
GOST |
Cite this
GOST Copy
Al Adel A. et al. Memory transformer with hierarchical attention for long document processing // 2021 International Conference Engineering and Telecommunication (En&T). 2021.
GOST all authors (up to 50) Copy
Al Adel A., Burtsev M. Memory transformer with hierarchical attention for long document processing // 2021 International Conference Engineering and Telecommunication (En&T). 2021.
RIS |
Cite this
RIS Copy
TY - CPAPER
DO - 10.1109/EnT50460.2021.9681776
UR - https://doi.org/10.1109/EnT50460.2021.9681776
TI - Memory transformer with hierarchical attention for long document processing
T2 - 2021 International Conference Engineering and Telecommunication (En&T)
AU - Al Adel, Arij
AU - Burtsev, Mikhail
PY - 2021
DA - 2021/11/24
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@inproceedings{2021_Al Adel,
author = {Arij Al Adel and Mikhail Burtsev},
title = {Memory transformer with hierarchical attention for long document processing},
year = {2021},
month = {nov},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)}
}