publications | Hao Li

2025

Preprint
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Xu Zhang, Hao Li, and Zhichao Lu

In arXiv, 2025

Abs Bib PDF Code

Multimodal Large Language Models (MLLMs) achieve strong reasoning and perception capabilities but are increasingly vulnerable to jailbreak attacks. While existing work focuses on explicit attacks, where malicious content resides in a single modality, recent studies reveal implicit attacks, in which benign text and image inputs jointly express unsafe intent. Such joint-modal threats are difficult to detect and remain underexplored, largely due to the scarcity of high-quality implicit data. We propose ImpForge, an automated red-teaming pipeline that leverages reinforcement learning with tailored reward modules to generate diverse implicit samples across 14 domains. Building on this dataset, we further develop CrossGuard, an intent-aware safeguard providing robust and comprehensive defense against both explicit and implicit threats. Extensive experiments across safe and unsafe benchmarks, implicit and explicit attacks, and multiple out-of-domain settings demonstrate that CrossGuard significantly outperforms existing defenses, including advanced MLLMs and guardrails, achieving stronger security while maintaining high utility. This offers a balanced and practical solution for enhancing MLLM robustness against real-world multimodal threats.
@inproceedings{CrossGuard, author = {Zhang, Xu and Li, Hao and Lu, Zhichao}, title = {CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks}, booktitle = {arXiv}, year = {2025}, }
NeurIPS
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, and 2 more authors

In NeurIPS, 2025

Abs Bib PDF Code

Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, these agents can carry out complex user tasks. Nonetheless, this interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent’s behavior, potentially resulting in economic loss, privacy leakage, or system compromise. System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose DRIFT, a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control- and data-level constraints. A Secure Planner first constructs a minimal function trajectory and a JSON-schema-style parameter checklist for each function node based on the user query. A Dynamic Validator then monitors deviations from the original plan, assessing whether changes comply with privilege limitations and the user’s intent. Finally, an Injection Isolator detects and masks any instructions that may conflict with the user query from the memory stream to mitigate long-term risks. We empirically validate the effectiveness of DRIFT on the AgentDojo benchmark, demonstrating its strong security performance while maintaining high utility across diverse models—showcasing both its robustness and adaptability
@inproceedings{DRIFT, author = {Li, Hao and Liu, Xiaogeng and Chiu, Hung-Chun and Li, Dianqi and Zhang, Ning and Xiao, Chaowei}, title = {DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents}, booktitle = {NeurIPS}, year = {2025}, }
ACL
PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free

Hao Li*, Xiaogeng Liu*, Ning Zhang, and Chaowei Xiao

In ACL, 2025

Abs Bib HTML PDF Code

Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense – falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce NotInject, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). To mitigate this, we propose InjecGuard, a novel prompt guard model that incorporates a new training strategy, Mitigating Over-defense for Free (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8%, offering a robust and open-source solution for detecting prompt injection attacks.
@inproceedings{PIGuard, author = {Li*, Hao and Liu*, Xiaogeng and Zhang, Ning and Xiao, Chaowei}, title = {PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free}, booktitle = {ACL}, year = {2025}, }
NAACL
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Hao Li*, Chenghao Yang*, An Zhang, Yang Deng, and 2 more authors

In NAACL, 2025

Abs Bib PDF Code

Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses. Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. For the event memory module, long and short-term memory banks are employed to separately focus on historical and ongoing sessions, while a topic-based retrieval mechanism is introduced to enhance the accuracy of memory retrieval. Furthermore, the persona module conducts dynamic persona modeling for both users and agents. The integration of retrieved memories and extracted personas is subsequently fed into the generator to induce appropriate responses. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated across various illustrative benchmarks, models, and tasks. The code is released at https://github.com/leolee99/LD-Agent.
@inproceedings{LD-Agent, author = {Li*, Hao and Yang*, Chenghao and Zhang, An and Deng, Yang and Wang, Xiang and Chua, Tat-seng}, title = {Hello Again! LLM-powered Personalized Agent for Long-term Dialogue}, booktitle = {NAACL}, year = {2025}, }

2024

AAAI
Negative Pre-aware for Noisy Cross-modal Matching

Xu Zhang*, Hao Li*, and Mang Ye

In AAAI, 2024

Abs Bib PDF Code

Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. Due to the cumulative and unavoidable negative impact of unresolved noise, existing methods cannot maintain a stable performance when the noise increases. In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tuning on noisy downstream tasks. It is featured in two aspects: (1) For noise recognition and resistance, previous methods usually directly filter out a noise subset, we propose to estimate the negative impact of each sample. It does not need additional correction mechanisms that may predict unreliable correction results, leading to self-reinforcing error. We assign a confidence weight to each sample according to its negative impact in the training process. This adaptively adjusts the contribution of each sample to avoid noisy accumulation. (2) For maintaining stable performance with increasing noise, we utilize the memorization effect of DNNs by maintaining a memory bank. Specifically, we apply GMM to select high-confident clean samples as the memory entry, where the memory entry is used to estimate the negative impact of each sample. Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio. Compared to the correction mechanism focusing on noise samples, memory bank-based estimation is more robust, which makes the model performance stable on noisy datasets. Extensive experiments demonstrate that our method significantly improves matching accuracy and performance stability at increasing noise ratio. Our approach also surpasses the state-of-the-art methods by a large margin.
@inproceedings{NPC, author = {Zhang*, Xu and Li*, Hao and Ye, Mang}, title = {Negative Pre-aware for Noisy Cross-modal Matching}, booktitle = {AAAI}, year = {2024}, }
Preprint
One-step Noisy Label Mitigation

Hao Li*, Jiayang Gu*, Jingkuan Song, An Zhang, and 1 more author

In , 2024

Abs Bib PDF Code

Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-step Anti-Noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference, a costefficient process. We empirically demonstrate the superiority of OSA, highlighting its enhanced training robustness, improved task transferability, ease of deployment, and reduced computational costs across various benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA.
@inproceedings{OSA, author = {Li*, Hao and Gu*, Jiayang and Song, Jingkuan and Zhang, An and Gao, Lianli}, title = {One-step Noisy Label Mitigation}, eprint = {2410.01944}, archiveprefix = {arXiv}, year = {2024}, }

2023

NeurIPS
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, and 1 more author

In NeurIPS, 2023

Abs Bib PDF Code

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.
@inproceedings{PAU, author = {Li, Hao and Song, Jingkuan and Gao, Lianli and Zhu, Xiaosu and Shen, Heng Tao}, title = {Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval}, booktitle = {NeurIPS}, year = {2023}, }
Preprint
On Generative Agents in Recommendation

An Zhang*, Leheng Sheng*, Yuxin Chen*, Hao Li, and 3 more authors

2023

Abs Bib PDF Code

Recommender systems are the cornerstone of today’s information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a novel movie recommendation simulator, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents’ profile modules are initialized using the MovieLens dataset, capturing users’ unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized movie recommendations in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: to what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems? Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks.
@misc{Agent4Rec, author = {Zhang*, An and Sheng*, Leheng and Chen*, Yuxin and Li, Hao and Deng, Yang and Wang, Xiang and Chua, Tat-Seng}, title = {On Generative Agents in Recommendation}, eprint = {2310.10108}, archiveprefix = {arXiv}, year = {2023}, }

2022

NeurIPS
A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, and 2 more authors

In NeurIPS, 2022

Abs Bib HTML PDF Supp Code

Cross-modal retrieval aims to build correspondence between multiple modalities by learning a common representation space. Typically, an image can match multiple texts semantically and vice versa, which significantly increases the difficulty of this task. To address this problem, probabilistic embedding is proposed to quantify these many-to-many relationships. However, existing datasets (e.g., MS-COCO) and metrics (e.g., Recall@K) cannot fully represent these diversity correspondences due to non-exhaustive annotations. Based on this observation, we utilize semantic correlation computed by CIDEr to find the potential correspondences. Then we present an effective metric, named Average Semantic Precision (ASP), which can measure the ranking precision of semantic correlation for retrieval sets. Additionally, we introduce a novel and concise objective, coined Differentiable ASP Approximation (DAA). Concretely, DAA can optimize ASP directly by making the ranking function of ASP differentiable through a sigmoid function. To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval. The results show that our approach obtains superior performance over the state-of-the-art approaches on all metrics.
@inproceedings{DAA, author = {Li, Hao and Song, Jingkuan and Gao, Lianli and Zeng, Pengpeng and Zhang, Haonan and Li, Gongfu}, title = {A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval}, booktitle = {NeurIPS}, volume = {35}, pages = {11934--11946}, year = {2022}, }