Open Access
Open access
volume 2 issue 1 publication number 33

Patch is enough: naturalistic adversarial patch against vision-language pre-training models

Dehong Kong 1, 2
Siyuan Liang 3
Xiaopeng Zhu 4
Yuansheng Zhong 4
Wenqi Ren 1, 2
1
 
School of Cyber Science and Technology, Shenzhen, China
4
 
Guangdong Testing Institute of Product Quality Supervision, Guangzhou, China
Publication typeJournal Article
Publication date2024-12-13
SJR
CiteScore4.0
Impact factor
ISSN27319008, 20973330
Abstract

Visual language pre-training (VLP) models have demonstrated significant success in various domains, but they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multi-modal learning. Traditionally, adversarial methods that target VLP models involve simultaneous perturbation of images and text. However, this approach faces significant challenges. First, adversarial perturbations often fail to translate effectively into real-world scenarios. Second, direct modifications to the text are conspicuously visible. To overcome these limitations, we propose a novel strategy that uses only image patches for attacks, thus preserving the integrity of the original text. Our method leverages prior knowledge from diffusion models to enhance the authenticity and naturalness of the perturbations. Moreover, to optimize patch placement and improve the effectiveness of our attacks, we utilize the cross-attention mechanism, which encapsulates inter-modal interactions by generating attention maps to guide strategic patch placement. Extensive experiments conducted in a white-box setting for image-to-text scenarios reveal that our proposed method significantly outperforms existing techniques, achieving a 100% attack success rate.

Found 
Found 

Top-30

Journals

1
IEEE Transactions on Information Forensics and Security
1 publication, 50%
Scientific Reports
1 publication, 50%
1

Publishers

1
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 50%
Springer Nature
1 publication, 50%
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
2
Share
Cite this
GOST |
Cite this
GOST Copy
Kong D. et al. Patch is enough: naturalistic adversarial patch against vision-language pre-training models // Visual Intelligence. 2024. Vol. 2. No. 1. 33
GOST all authors (up to 50) Copy
Kong D., Liang S., Zhu X., Zhong Y., Ren W. Patch is enough: naturalistic adversarial patch against vision-language pre-training models // Visual Intelligence. 2024. Vol. 2. No. 1. 33
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1007/s44267-024-00066-7
UR - https://link.springer.com/10.1007/s44267-024-00066-7
TI - Patch is enough: naturalistic adversarial patch against vision-language pre-training models
T2 - Visual Intelligence
AU - Kong, Dehong
AU - Liang, Siyuan
AU - Zhu, Xiaopeng
AU - Zhong, Yuansheng
AU - Ren, Wenqi
PY - 2024
DA - 2024/12/13
PB - Springer Nature
IS - 1
VL - 2
SN - 2731-9008
SN - 2097-3330
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2024_Kong,
author = {Dehong Kong and Siyuan Liang and Xiaopeng Zhu and Yuansheng Zhong and Wenqi Ren},
title = {Patch is enough: naturalistic adversarial patch against vision-language pre-training models},
journal = {Visual Intelligence},
year = {2024},
volume = {2},
publisher = {Springer Nature},
month = {dec},
url = {https://link.springer.com/10.1007/s44267-024-00066-7},
number = {1},
pages = {33},
doi = {10.1007/s44267-024-00066-7}
}