Pattern Recognition

, volume 158 , pages 110967

HTR-VT: Handwritten text recognition with vision transformer

Yuting Li ¹

Dexiong Chen ²

Tinglong Tang ¹

Xi Shen ³

Hide authors affiliations Show authors affiliations: 3 affiliations

China Three Gorges University, China |

Max Planck Institute of Biochemistry, Germany |

Intellindust, China |

Publication type: Journal Article

Publication date: 2025-02-01

Elsevier

Pattern Recognition

scimago Q1

wos Q1

SJR: 2.058

CiteScore: 15.8

Impact factor: 7.6

ISSN: 00313203, 18735142

DOI: 10.1016/j.patcog.2024.110967

Copy DOI

Abstract

We explore the application of Vision Transformer (ViT) for handwritten text recognition. The limited availability of labeled data in this domain poses challenges for achieving high performance solely relying on ViT. Previous transformer-based models required external data or extensive pre-training on large datasets to excel. To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer. We find that incorporate a Convolutional Neural Network (CNN) for feature extraction instead of the original patch embedding and employ Sharpness-Aware Minimization (SAM) optimizer to ensure that the model can converge towards flatter minima yield notable enhancements. Furthermore, our introduction of the span mask technique, which masks interconnected features in the feature map, acts as an effective regularizer. Empirically, our approach competes favarably with traditional CNN-based models on small datasets like IAM and READ2016. Additionally, it establishes a new benchmark on the LAM dataset, currently the largest dataset with 19,830 training text lines. The code will be publicly available at: https://github.com/YutingLi0606/HTR-VT.

Found

Top-30

Journals

	1 2 3 4 5 6
Lecture Notes in Computer Science	Lecture Notes in Computer Science, 6, 21.43% Lecture Notes in Computer Science 6 publications, 21.43%
Expert Systems with Applications	Expert Systems with Applications, 2, 7.14% Expert Systems with Applications 2 publications, 7.14%
Pattern Recognition	Pattern Recognition, 2, 7.14% Pattern Recognition 2 publications, 7.14%
Neurocomputing	Neurocomputing, 1, 3.57% Neurocomputing 1 publication, 3.57%
International Journal of Heat and Mass Transfer	International Journal of Heat and Mass Transfer, 1, 3.57% International Journal of Heat and Mass Transfer 1 publication, 3.57%
ACM Transactions on Multimedia Computing, Communications and Applications	ACM Transactions on Multimedia Computing, Communications and Applications, 1, 3.57% ACM Transactions on Multimedia Computing, Communications and Applications 1 publication, 3.57%
Visual Computer	Visual Computer, 1, 3.57% Visual Computer 1 publication, 3.57%
IEEE Access	IEEE Access, 1, 3.57% IEEE Access 1 publication, 3.57%
International Journal on Document Analysis and Recognition	International Journal on Document Analysis and Recognition, 1, 3.57% International Journal on Document Analysis and Recognition 1 publication, 3.57%
Scientific Reports	Scientific Reports, 1, 3.57% Scientific Reports 1 publication, 3.57%
Advances in Intelligent Systems and Computing	Advances in Intelligent Systems and Computing, 1, 3.57% Advances in Intelligent Systems and Computing 1 publication, 3.57%
Journal of Documentation	Journal of Documentation, 1, 3.57% Journal of Documentation 1 publication, 3.57%
Digital Signal Processing: A Review Journal	Digital Signal Processing: A Review Journal, 1, 3.57% Digital Signal Processing: A Review Journal 1 publication, 3.57%
ACM Computing Surveys	ACM Computing Surveys, 1, 3.57% ACM Computing Surveys 1 publication, 3.57%
Lecture Notes in Networks and Systems	Lecture Notes in Networks and Systems, 1, 3.57% Lecture Notes in Networks and Systems 1 publication, 3.57%
	1 2 3 4 5 6

Publishers

	2 4 6 8 10 12
Springer Nature	Springer Nature, 11, 39.29% Springer Nature 11 publications, 39.29%
Elsevier	Elsevier, 7, 25% Elsevier 7 publications, 25%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 7, 25% Institute of Electrical and Electronics Engineers (IEEE) 7 publications, 25%
Association for Computing Machinery (ACM)	Association for Computing Machinery (ACM), 2, 7.14% Association for Computing Machinery (ACM) 2 publications, 7.14%
Emerald	Emerald, 1, 3.57% Emerald 1 publication, 3.57%
	2 4 6 8 10 12

We do not take into account publications without a DOI.
Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Li Y. et al. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.

GOST all authors (up to 50) Copy

Li Y., Chen D., Tang T., Shen X. HTR-VT: Handwritten text recognition with vision transformer // Pattern Recognition. 2025. Vol. 158. p. 110967.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.1016/j.patcog.2024.110967

UR - https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180

TI - HTR-VT: Handwritten text recognition with vision transformer

T2 - Pattern Recognition

AU - Li, Yuting

AU - Chen, Dexiong

AU - Tang, Tinglong

AU - Shen, Xi

PY - 2025

DA - 2025/02/01

PB - Elsevier

SP - 110967

VL - 158

SN - 0031-3203

SN - 1873-5142

ER -

BibTex

Cite this

BibTex (up to 50 authors) Copy

@article{2025_Li,

author = {Yuting Li and Dexiong Chen and Tinglong Tang and Xi Shen},

title = {HTR-VT: Handwritten text recognition with vision transformer},

journal = {Pattern Recognition},

year = {2025},

volume = {158},

publisher = {Elsevier},

month = {feb},

url = {https://linkinghub.elsevier.com/retrieve/pii/S0031320324007180},

pages = {110967},

doi = {10.1016/j.patcog.2024.110967}

}

Publisher

Elsevier

Journal

Pattern Recognition

scimago Q1

wos Q1

SJR

2.058

CiteScore

15.8

Impact factor

7.6

ISSN

00313203 (Print)

18735142