HTR-VT: Handwritten text recognition with vision transformer
Publication type: Journal Article
Publication date: 2025-02-01
scimago Q1
wos Q1
SJR: 2.058
CiteScore: 15.8
Impact factor: 7.6
ISSN: 00313203, 18735142
Abstract
We explore the application of Vision Transformer (ViT) for handwritten text recognition. The limited availability of labeled data in this domain poses challenges for achieving high performance solely relying on ViT. Previous transformer-based models required external data or extensive pre-training on large datasets to excel. To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer. We find that incorporate a Convolutional Neural Network (CNN) for feature extraction instead of the original patch embedding and employ Sharpness-Aware Minimization (SAM) optimizer to ensure that the model can converge towards flatter minima yield notable enhancements. Furthermore, our introduction of the span mask technique, which masks interconnected features in the feature map, acts as an effective regularizer. Empirically, our approach competes favarably with traditional CNN-based models on small datasets like IAM and READ2016. Additionally, it establishes a new benchmark on the LAM dataset, currently the largest dataset with 19,830 training text lines. The code will be publicly available at: https://github.com/YutingLi0606/HTR-VT.
Found
Nothing found, try to update filter.
Found
Nothing found, try to update filter.
Top-30
Journals
|
1
2
3
4
5
6
|
|
|
Lecture Notes in Computer Science
6 publications, 21.43%
|
|
|
Expert Systems with Applications
2 publications, 7.14%
|
|
|
Pattern Recognition
2 publications, 7.14%
|
|
|
Neurocomputing
1 publication, 3.57%
|
|
|
International Journal of Heat and Mass Transfer
1 publication, 3.57%
|
|
|
ACM Transactions on Multimedia Computing, Communications and Applications
1 publication, 3.57%
|
|
|
Visual Computer
1 publication, 3.57%
|
|
|
IEEE Access
1 publication, 3.57%
|
|
|
International Journal on Document Analysis and Recognition
1 publication, 3.57%
|
|
|
Scientific Reports
1 publication, 3.57%
|
|
|
Advances in Intelligent Systems and Computing
1 publication, 3.57%
|
|
|
Journal of Documentation
1 publication, 3.57%
|
|
|
Digital Signal Processing: A Review Journal
1 publication, 3.57%
|
|
|
ACM Computing Surveys
1 publication, 3.57%
|
|
|
Lecture Notes in Networks and Systems
1 publication, 3.57%
|
|
|
1
2
3
4
5
6
|
Publishers
|
2
4
6
8
10
12
|
|
|
Springer Nature
11 publications, 39.29%
|
|
|
Elsevier
7 publications, 25%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
7 publications, 25%
|
|
|
Association for Computing Machinery (ACM)
2 publications, 7.14%
|
|
|
Emerald
1 publication, 3.57%
|
|
|
2
4
6
8
10
12
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
28
Total citations:
28
Citations from 2024:
25
(89.29%)
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Li Y. et al. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.
GOST all authors (up to 50)
Copy
Li Y., Chen D., Tang T., Shen X. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1016/j.patcog.2024.110967
UR - https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180
TI - HTR-VT: Handwritten text recognition with vision transformer
T2 - Pattern Recognition
AU - Li, Yuting
AU - Chen, Dexiong
AU - Tang, Tinglong
AU - Shen, Xi
PY - 2025
DA - 2025/02/01
PB - Elsevier
SP - 110967
VL - 158
SN - 0031-3203
SN - 1873-5142
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2025_Li,
author = {Yuting Li and Dexiong Chen and Tinglong Tang and Xi Shen},
title = {HTR-VT: Handwritten text recognition with vision transformer},
journal = {Pattern Recognition},
year = {2025},
volume = {158},
publisher = {Elsevier},
month = {feb},
url = {https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180},
pages = {110967},
doi = {10.1016/j.patcog.2024.110967}
}