,
pages 418-429
A Dataset of Vietnamese Documents for Text Detection
2
Deep Learning and Applications, Ho Chi Minh City, Vietnam
|
Publication type: Book Chapter
Publication date: 2023-11-17
scimago Q4
SJR: 0.182
CiteScore: 1.1
Impact factor: —
ISSN: 18650929, 18650937
Abstract
Document analysis and recognition is a crucial technique for automating the input process of forms, receipts, documents at banks, governments, companies. With demands in both research and industry, there are available datasets for Document Analysis and Recognition in English, Chinese, Arabic, and Indic. However, there is no publicly datasets for Vietnamese Document Analysis and Recognition. In this paper, we introduce a new dataset for Vietnamese Document analysis named VNDoc, which aims to set up a standard dataset for researching and developing Vietnamese Document Analysis Systems. The dataset contains 226 documents scanned from mobile phones and scan machines. The documents are collected from diverse categories such as legal and administrations, invoices, resumes, handwriting forms, and so on, which target various applications. At the first stage, we provide ground truth for text lines, which allow performing research in text detection and layout analysis. Moreover, we describe a statistical analysis of text length and bounding box in the dataset and initial experiments for the existing methods for text detection. We are going to provide text transcriptions and available for research communities.
Found
Nothing found, try to update filter.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Total citations:
0
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Le A., Mai D. T. H., Lam T. A Dataset of Vietnamese Documents for Text Detection // Communications in Computer and Information Science. 2023. pp. 418-429.
GOST all authors (up to 50)
Copy
Le A., Mai D. T. H., Lam T. A Dataset of Vietnamese Documents for Text Detection // Communications in Computer and Information Science. 2023. pp. 418-429.
Cite this
RIS
Copy
TY - GENERIC
DO - 10.1007/978-981-99-8296-7_30
UR - https://doi.org/10.1007/978-981-99-8296-7_30
TI - A Dataset of Vietnamese Documents for Text Detection
T2 - Communications in Computer and Information Science
AU - Le, Anh
AU - Mai, Dang Tran Hai
AU - Lam, Thanh
PY - 2023
DA - 2023/11/17
PB - Springer Nature
SP - 418-429
SN - 1865-0929
SN - 1865-0937
ER -
Cite this
BibTex (up to 50 authors)
Copy
@incollection{2023_Le,
author = {Anh Le and Dang Tran Hai Mai and Thanh Lam},
title = {A Dataset of Vietnamese Documents for Text Detection},
publisher = {Springer Nature},
year = {2023},
pages = {418--429},
month = {nov}
}