ACM Transactions on Asian and Low-Resource Language Information Processing, volume 24, issue 4, pages 1-15

A Hybrid Statistical and Rule-based Approach to Extremely Low-resource Machine Transliteration

Patrick Charles Connor 1
1
 
Prince Edward Island, Charlottetown, Canada
Publication typeJournal Article
Publication date2025-03-23
scimago Q2
SJR0.535
CiteScore3.6
Impact factor1.8
ISSN23754699, 23754702
Abstract

Machine transliteration work has focused primarily on languages with large volumes of parallel corpus, and between language pairs whose orthographies are very different. In contrast, a large proportion of the world’s languages have vastly fewer resources and employ Roman-like alphabets often with large degrees of orthographic overlap with high-resource languages. We propose that machine transliteration between languages with few training examples can be accomplished by a noisy-channel-like statistical model captured in a human editable format with practical rule-based capabilities built-in. This hybrid approach allows users to take advantage of an algorithm to find and apply common transformations in context while providing rigorous control over the output. Effectiveness is evaluated on the Bible names translation matrix dataset of Wu et al. (2018), covering 591 languages that involve 590 names on average per language pair. Our approach slightly exceeds past results and explores several features targeted at benefiting the extremely low-resource language domain.

Found 
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?