Open Access
,
pages 428-441
Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models
Chihaya Matsuhira
1
,
Marc A Kastner
1, 2
,
Takahiro Komamizu
1
,
Takatsugu HIRAYAMA
1, 3
,
Ichiro Ide
1
3
University of Human Environments, Okazaki, Japan
|
Publication type: Book Chapter
Publication date: 2025-01-01
scimago Q2
SJR: 0.352
CiteScore: 2.4
Impact factor: —
ISSN: 03029743, 16113349, 18612075, 18612083
Abstract
Quantifying the associations between images and adjectives, i.e., how much the visual characteristics of an image are connected with a certain adjective, is important for better image understanding. For instance, the appearance of a kitten can be associated with adjectives such as “soft”, “small”, and “cute” rather than the opposite “hard”, “large”, and “scary”. Thus, giving scores for a kitten photo considering the degree of its association with each antonym adjective pair (termed adjective axis, e.g., “round” vs. “sharp”) aids in understanding the image content and its atmosphere. Existing methods rely on subjective human engagement, making it difficult to estimate the association of images with arbitrary adjective axes in a single framework. To enable the extension to arbitrary axes, we explore the use of large-scale pretrained models, including Large Language Models (LLMs) and Vision Language Models (VLMs). In the proposed training-free framework, users only need to specify a pair of antonym nouns that negatively and positively describe the target axis (e.g., “roundness” and “sharpness”). Evaluation confirms that the proposed framework can predict negative and positive associations between adjectives and images as correctly as the manually-assisted comparative. The result also highlights the pros and cons of utilizing the VLM’s textual or visual embedding for specific types of adjective axes. Furthermore, computing the similarities among four adjective axes unveils how the proposed framework connects them with each other, such as its tendency to regard a sharp object as being small, hard, and quick in motion.
Found
Nothing found, try to update filter.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Total citations:
0
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Matsuhira C. et al. Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models // Lecture Notes in Computer Science. 2025. pp. 428-441.
GOST all authors (up to 50)
Copy
Matsuhira C., Kastner M. A., Komamizu T., HIRAYAMA T., Ide I. Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models // Lecture Notes in Computer Science. 2025. pp. 428-441.
Cite this
RIS
Copy
TY - GENERIC
DO - 10.1007/978-981-96-2071-5_31
UR - https://link.springer.com/10.1007/978-981-96-2071-5_31
TI - Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models
T2 - Lecture Notes in Computer Science
AU - Matsuhira, Chihaya
AU - Kastner, Marc A
AU - Komamizu, Takahiro
AU - HIRAYAMA, Takatsugu
AU - Ide, Ichiro
PY - 2025
DA - 2025/01/01
PB - Springer Nature
SP - 428-441
SN - 0302-9743
SN - 1611-3349
SN - 1861-2075
SN - 1861-2083
ER -
Cite this
BibTex (up to 50 authors)
Copy
@incollection{2025_Matsuhira,
author = {Chihaya Matsuhira and Marc A Kastner and Takahiro Komamizu and Takatsugu HIRAYAMA and Ichiro Ide},
title = {Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models},
publisher = {Springer Nature},
year = {2025},
pages = {428--441},
month = {jan}
}