ACM Transactions on Asian and Low-Resource Language Information Processing, volume 24, issue 4, pages 1-28

A Survey of Document Stemming Algorithms in Information Retrieval Systems

Mona Alyousf ¹

Mohamad Firas Al Halabi ¹

Department of Mathematics (Section of Applied Mathematics and Informatics), Damascus University, Damascus, Syrian Arab Republic |

Publication type: Journal Article

Publication date: 2025-03-23

Association for Computing Machinery (ACM)

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing

scimago Q2

SJR: 0.535

CiteScore: 3.6

Impact factor: 1.8

ISSN: 23754699, 23754702

DOI: 10.1145/3715120

Copy DOI

Abstract

With the increase in the growth and diversity of databases and the enormity of their contents, there has become an urgent need to find advanced techniques in Natural Language Processing (NLP) applications, especially in the field of Information Retrieval (IR). One of the most popular techniques that can improve information retrieval is the stemming of text documents. Given the importance of stemming for information retrieval systems, in this paper, we present a detailed study of the adopted stemming approaches and the working mechanism of the various algorithms that follow each approach. We analyzed and evaluated the most important algorithms by comparing them based on specific criteria, including their strength in stemming, their advantages, and the disadvantages of each. Based on this comparison, we can identify the weaknesses that each stemming algorithm suffers from. We mainly aim through the study that we conducted in this paper to try to overcome the weaknesses of these algorithms and take advantage of their most important advantages to develop a new more efficient stemming algorithm for the English language.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST | RIS | BibTex | MLA

Found error?

Publisher

Association for Computing Machinery (ACM)

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing

scimago Q2

SJR

0.535

CiteScore

3.6

Impact factor

1.8

ISSN

23754699 (Print)

23754702 (Electronic)