ACM Transactions on Asian and Low-Resource Language Information Processing, volume 24, issue 4, pages 1-21

Exploring Semantic Attributes for Image Caption Synthesis in Low-Resource Assamese Language

Pankaj Choudhury 1
Prithwijit Guha 2
Sukumar Nandi 3
1
 
Center For lingustics Science and Technology, Indian Institute of Technology Guwahati, Guwahati, India
2
 
Electronics & Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, India
Publication typeJournal Article
Publication date2025-03-23
scimago Q2
SJR0.535
CiteScore3.6
Impact factor1.8
ISSN23754699, 23754702
Abstract

Research on image caption generation has predominantly focused on resource-rich languages like English, leaving resource-poor languages (like Assamese and several others) largely understudied. In this context, this paper leverages both visual and semantic attribute based features for generating captions in Assamese language. Semantic attributes refer to the significant words that represent higher-level knowledge about the image content. This work contributes through the effective use of features derived from semantic words in low resource Assamese language. The second contribution is the proposal of a Visual-Semantic Self-Attention (VSSA) module for the combination of features derived from images and semantic attributes. The VSSA module enables the image captioning model to dynamically attend to relevant regions of the image as well as the important semantic attributes, thereby leading to more contextually relevant and linguistically accurate Assamese captions. Moreover, the VSSA module is incorporated into a Transformer model to leverage the stacked attention for performance improvement. The model is trained by using both cross-entropy loss optimization and reinforcement learning approach. The effectiveness of the proposed model is evaluated through both qualitative and quantitative analyses (using BLEU-n and CIDEr metrics). The proposed model shows significant performance improvement in Assamese caption synthesis compared to previous methods, achieving 93.7% CIDEr score on the COCO-Assamese Caption (COCO-AC) dataset.

  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?