Open Access
Open access
volume 12 pages 190582-190597

Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling

Ivan Pavlovich Malashin 1
Andrei P. Gantimurov 1
Vladimir A Neluyb 1
Vladimir A Nelyub 1
Aleksei S. Borodulin 1
Aleksei Borodulin 1
Publication typeJournal Article
Publication date2024-12-05
scimago Q1
wos Q2
SJR0.849
CiteScore9.0
Impact factor3.6
ISSN21693536
Abstract
The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated XML tag filling is proposed. NLP enhances content understanding, while regex enables precise pattern matching. This approach streamlines the conversion process, reducing manual effort and ensuring standardized tagging. Through experiments, the effectiveness of the method in achieving accurate XML markup aligned with NISO STS guidelines is validated. This research advances automated data structuring, exemplified by the GOST R ontology within NISO STS standards, providing a template for other ontology-based document XML-structuring.
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Malashin I. P. et al. Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling // IEEE Access. 2024. Vol. 12. pp. 190582-190597.
GOST all authors (up to 50) Copy
Malashin I. P., Tynchenko V., Gantimurov A. P., Neluyb V. A., Nelyub V. A., Borodulin A. S., Borodulin A. Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling // IEEE Access. 2024. Vol. 12. pp. 190582-190597.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1109/access.2024.3511674
UR - https://ieeexplore.ieee.org/document/10778543/
TI - Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling
T2 - IEEE Access
AU - Malashin, Ivan Pavlovich
AU - Tynchenko, Vadim
AU - Gantimurov, Andrei P.
AU - Neluyb, Vladimir A
AU - Nelyub, Vladimir A
AU - Borodulin, Aleksei S.
AU - Borodulin, Aleksei
PY - 2024
DA - 2024/12/05
PB - Institute of Electrical and Electronics Engineers (IEEE)
SP - 190582-190597
VL - 12
SN - 2169-3536
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2024_Malashin,
author = {Ivan Pavlovich Malashin and Vadim Tynchenko and Andrei P. Gantimurov and Vladimir A Neluyb and Vladimir A Nelyub and Aleksei S. Borodulin and Aleksei Borodulin},
title = {Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling},
journal = {IEEE Access},
year = {2024},
volume = {12},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
month = {dec},
url = {https://ieeexplore.ieee.org/document/10778543/},
pages = {190582--190597},
doi = {10.1109/access.2024.3511674}
}