Entropy‑Guided Token Attacks on Vision‑Language Models

← Back to Papers

Research Paper

Entropy‑Guided Token Attacks on Vision‑Language Models

Authors: Mengqi He,

multimodal ai-safety computer-vision advanced ▲ 6 • arXiv ↗ • HuggingFace ↗

Organization: Hugging Face

Published: 2026-01-09 • Added: 2026-01-09

Key Insights

Tokens with the highest predictive entropy dominate the semantic output of V‑L models; tampering only with these few tokens yields large degradations.
Entropy‑driven attacks achieve comparable (or greater) success with far lower perturbation budgets than naïve or gradient‑based token attacks.
The vulnerability transfers across diverse V‑L architectures (e.g., CLIP, BLIP, ViLT), indicating a systemic weakness in multimodal alignment mechanisms.
Computing token entropy from the model’s own output distribution provides an efficient, model‑agnostic way to select attack targets without requiring full gradient information.

Abstract

Selective adversarial attacks targeting high-entropy tokens in vision-language models achieve significant semantic degradation with reduced budgets and demonstrate transferable vulnerabilities across different architectures.

Full Analysis

# Entropy‑Guided Token Attacks on Vision‑Language Models **Authors:** Mengqi He, **Source:** [HuggingFace](https://huggingface.co/papers/2512.21815) | [arXiv](https://arxiv.org/abs/2512.21815) **Published:** 2026-01-09 **Organization:** Hugging Face ## Summary - Tokens with the highest predictive entropy dominate the semantic output of V‑L models; tampering only with these few tokens yields large degradations. - Entropy‑driven attacks achieve comparable (or greater) success with far lower perturbation budgets than naïve or gradient‑based token attacks. - The vulnerability transfers across diverse V‑L architectures (e.g., CLIP, BLIP, ViLT), indicating a systemic weakness in multimodal alignment mechanisms. - Computing token entropy from the model’s own output distribution provides an efficient, model‑agnostic way to select attack targets without requiring full gradient information. ## Abstract Selective adversarial attacks targeting high-entropy tokens in vision-language models achieve significant semantic degradation with reduced budgets and demonstrate transferable vulnerabilities across different architectures. --- *Topics: multimodal, ai-safety, computer-vision* *Difficulty: advanced* *Upvotes: 6*