volume 158 pages 110967

HTR-VT: Handwritten text recognition with vision transformer

Yuting Li 1
Dexiong Chen 2
Tinglong Tang 1
Xi Shen 3
Publication typeJournal Article
Publication date2025-02-01
scimago Q1
wos Q1
SJR2.058
CiteScore15.8
Impact factor7.6
ISSN00313203, 18735142
Abstract
We explore the application of Vision Transformer (ViT) for handwritten text recognition. The limited availability of labeled data in this domain poses challenges for achieving high performance solely relying on ViT. Previous transformer-based models required external data or extensive pre-training on large datasets to excel. To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer. We find that incorporate a Convolutional Neural Network (CNN) for feature extraction instead of the original patch embedding and employ Sharpness-Aware Minimization (SAM) optimizer to ensure that the model can converge towards flatter minima yield notable enhancements. Furthermore, our introduction of the span mask technique, which masks interconnected features in the feature map, acts as an effective regularizer. Empirically, our approach competes favarably with traditional CNN-based models on small datasets like IAM and READ2016. Additionally, it establishes a new benchmark on the LAM dataset, currently the largest dataset with 19,830 training text lines. The code will be publicly available at: https://github.com/YutingLi0606/HTR-VT.
Found 
Found 

Top-30

Journals

1
2
3
4
5
6
Lecture Notes in Computer Science
6 publications, 21.43%
Expert Systems with Applications
2 publications, 7.14%
Pattern Recognition
2 publications, 7.14%
Neurocomputing
1 publication, 3.57%
International Journal of Heat and Mass Transfer
1 publication, 3.57%
ACM Transactions on Multimedia Computing, Communications and Applications
1 publication, 3.57%
Visual Computer
1 publication, 3.57%
IEEE Access
1 publication, 3.57%
International Journal on Document Analysis and Recognition
1 publication, 3.57%
Scientific Reports
1 publication, 3.57%
Advances in Intelligent Systems and Computing
1 publication, 3.57%
Journal of Documentation
1 publication, 3.57%
Digital Signal Processing: A Review Journal
1 publication, 3.57%
ACM Computing Surveys
1 publication, 3.57%
Lecture Notes in Networks and Systems
1 publication, 3.57%
1
2
3
4
5
6

Publishers

2
4
6
8
10
12
Springer Nature
11 publications, 39.29%
Elsevier
7 publications, 25%
Institute of Electrical and Electronics Engineers (IEEE)
7 publications, 25%
Association for Computing Machinery (ACM)
2 publications, 7.14%
Emerald
1 publication, 3.57%
2
4
6
8
10
12
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
28
Share
Cite this
GOST |
Cite this
GOST Copy
Li Y. et al. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.
GOST all authors (up to 50) Copy
Li Y., Chen D., Tang T., Shen X. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1016/j.patcog.2024.110967
UR - https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180
TI - HTR-VT: Handwritten text recognition with vision transformer
T2 - Pattern Recognition
AU - Li, Yuting
AU - Chen, Dexiong
AU - Tang, Tinglong
AU - Shen, Xi
PY - 2025
DA - 2025/02/01
PB - Elsevier
SP - 110967
VL - 158
SN - 0031-3203
SN - 1873-5142
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Li,
author = {Yuting Li and Dexiong Chen and Tinglong Tang and Xi Shen},
title = {HTR-VT: Handwritten text recognition with vision transformer},
journal = {Pattern Recognition},
year = {2025},
volume = {158},
publisher = {Elsevier},
month = {feb},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180},
pages = {110967},
doi = {10.1016/j.patcog.2024.110967}
}