ACM Transactions on Asian and Low-Resource Language Information Processing, volume 24, issue 4, pages 1-15

A Hybrid Statistical and Rule-based Approach to Extremely Low-resource Machine Transliteration

Patrick Charles Connor ¹

Hide authors affiliations

Prince Edward Island, Charlottetown, Canada

Publication type: Journal Article

Publication date: 2025-03-23

Association for Computing Machinery (ACM)

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing

scimago Q2

SJR: 0.535

CiteScore: 3.6

Impact factor: 1.8

ISSN: 23754699, 23754702

DOI: 10.1145/3720542

Copy DOI

Abstract

Machine transliteration work has focused primarily on languages with large volumes of parallel corpus, and between language pairs whose orthographies are very different. In contrast, a large proportion of the world’s languages have vastly fewer resources and employ Roman-like alphabets often with large degrees of orthographic overlap with high-resource languages. We propose that machine transliteration between languages with few training examples can be accomplished by a noisy-channel-like statistical model captured in a human editable format with practical rule-based capabilities built-in. This hybrid approach allows users to take advantage of an algorithm to find and apply common transformations in context while providing rigorous control over the output. Effectiveness is evaluated on the Bible names translation matrix dataset of Wu et al. (2018), covering 591 languages that involve 590 names on average per language pair. Our approach slightly exceeds past results and explores several features targeted at benefiting the extremely low-resource language domain.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST | RIS | BibTex | MLA

Found error?

Publisher

Association for Computing Machinery (ACM)

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing

scimago Q2

SJR

0.535

CiteScore

3.6

Impact factor

1.8

ISSN

23754699 (Print)

23754702 (Electronic)