Open Access
Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling
Ivan Pavlovich Malashin
1
,
Vadim Tynchenko
1
,
Andrei P. Gantimurov
1
,
Vladimir A Neluyb
1
,
Vladimir A Nelyub
1
,
Aleksei S. Borodulin
1
,
Aleksei Borodulin
1
Publication type: Journal Article
Publication date: 2024-12-05
scimago Q1
wos Q2
SJR: 0.849
CiteScore: 9.0
Impact factor: 3.6
ISSN: 21693536
Abstract
The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated XML tag filling is proposed. NLP enhances content understanding, while regex enables precise pattern matching. This approach streamlines the conversion process, reducing manual effort and ensuring standardized tagging. Through experiments, the effectiveness of the method in achieving accurate XML markup aligned with NISO STS guidelines is validated. This research advances automated data structuring, exemplified by the GOST R ontology within NISO STS standards, providing a template for other ontology-based document XML-structuring.
Found
Nothing found, try to update filter.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Total citations:
0
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Malashin I. P. et al. Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling // IEEE Access. 2024. Vol. 12. pp. 190582-190597.
GOST all authors (up to 50)
Copy
Malashin I. P., Tynchenko V., Gantimurov A. P., Neluyb V. A., Nelyub V. A., Borodulin A. S., Borodulin A. Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling // IEEE Access. 2024. Vol. 12. pp. 190582-190597.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1109/access.2024.3511674
UR - https://ieeexplore.ieee.org/document/10778543/
TI - Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling
T2 - IEEE Access
AU - Malashin, Ivan Pavlovich
AU - Tynchenko, Vadim
AU - Gantimurov, Andrei P.
AU - Neluyb, Vladimir A
AU - Nelyub, Vladimir A
AU - Borodulin, Aleksei S.
AU - Borodulin, Aleksei
PY - 2024
DA - 2024/12/05
PB - Institute of Electrical and Electronics Engineers (IEEE)
SP - 190582-190597
VL - 12
SN - 2169-3536
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2024_Malashin,
author = {Ivan Pavlovich Malashin and Vadim Tynchenko and Andrei P. Gantimurov and Vladimir A Neluyb and Vladimir A Nelyub and Aleksei S. Borodulin and Aleksei Borodulin},
title = {Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling},
journal = {IEEE Access},
year = {2024},
volume = {12},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
month = {dec},
url = {https://ieeexplore.ieee.org/document/10778543/},
pages = {190582--190597},
doi = {10.1109/access.2024.3511674}
}
Profiles