CMSL: Cross-modal Style Learning for Few-shot Image Generation

Yue Jiang ^{1, 2}

Yueming Lyu ^{1, 2}

Bo Peng ¹

Wei WANG ¹

Jing Dong ¹

Hide authors affiliations Show authors affiliations: 2 affiliations

New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China |

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China |

Publication type: Journal Article

Publication date: 2025-03-20

Springer Nature

Machine Intelligence Research

scimago Q1

wos Q1

SJR: 1.271

CiteScore: 13.2

Impact factor: 8.7

ISSN: 2731538X, 27315398

DOI: 10.1007/s11633-024-1511-7

Copy DOI

Abstract

Training generative adversarial networks is data-demanding, which limits the development of these models on target domains with inadequate training data. Recently, researchers have leveraged generative models pretrained on sufficient data and fine-tuned them using small training samples, thus reducing data requirements. However, due to the lack of explicit focus on target styles and disproportionately concentrating on generative consistency, these methods do not perform well in diversity preservation which represents the adaptation ability for few-shot generative models. To mitigate the diversity degradation, we propose a framework with two key strategies: 1) To obtain more diverse styles from limited training data effectively, we propose a cross-modal module that explicitly obtains the target styles with a style prototype space and text-guided style instructions. 2) To inherit the generation capability from the pretrained model, we aim to constrain the similarity between the generated and source images with a structural discrepancy alignment module by maintaining the structure correlation in multiscale areas. We demonstrate the effectiveness of our method, which outperforms state-of-the-art methods in mitigating diversity degradation through extensive experiments and analyses.

Found

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST |

Cite this

GOST Copy

Jiang Y. et al. CMSL: Cross-modal Style Learning for Few-shot Image Generation // Machine Intelligence Research. 2025.

GOST all authors (up to 50) Copy

Jiang Y., Lyu Y., Peng B., WANG W., Dong J. CMSL: Cross-modal Style Learning for Few-shot Image Generation // Machine Intelligence Research. 2025.

RIS |

Cite this

RIS Copy

TY - JOUR

DO - 10.1007/s11633-024-1511-7

UR - https://link.springer.com/10.1007/s11633-024-1511-7

TI - CMSL: Cross-modal Style Learning for Few-shot Image Generation

T2 - Machine Intelligence Research

AU - Jiang, Yue

AU - Lyu, Yueming

AU - Peng, Bo

AU - WANG, Wei

AU - Dong, Jing

PY - 2025

DA - 2025/03/20

PB - Springer Nature

SN - 2731-538X

SN - 2731-5398

ER -

BibTex

Cite this

BibTex (up to 50 authors) Copy

@article{2025_Jiang,

author = {Yue Jiang and Yueming Lyu and Bo Peng and Wei WANG and Jing Dong},

title = {CMSL: Cross-modal Style Learning for Few-shot Image Generation},

journal = {Machine Intelligence Research},

year = {2025},

publisher = {Springer Nature},

month = {mar},

url = {https://link.springer.com/10.1007/s11633-024-1511-7},

doi = {10.1007/s11633-024-1511-7}

}

Publisher

Springer Nature

Journal

Machine Intelligence Research

scimago Q1

wos Q1

SJR

1.271

CiteScore

13.2

Impact factor

8.7

ISSN

2731538X (Print)

27315398 (Electronic)