Bio

Currently, I fortunately work with Prof. Chaowei Xiao on AI security. Previously, I was a master student in computer science at University of Electronic Science and Technology of China, co-advised by Prof. Jingkuan Song and Prof. Lianli Gao.

My research interest mainly lies on multi-modality, AI Agent, and trustworthy learning. I have served as a reviewer in WWW 24, TMM, CVPR 24, ICML 24, ECCV 24, NeurIPS 24, AAAI 25, ICLR 25.

News

May 15, 2025	One paper was accepted by ACL 2025.
Jan 24, 2025	One paper was accepted by NAACL 2025.
Dec 10, 2023	One paper was accepted by AAAI 2024. Congratulating to all the collaborators Xu Zhang and Prof. Mang Ye.
Dec 10, 2023	I will attend to the conference of NeurIPS 2023 held in New Orleans!
Sep 22, 2023	One paper was accepted by NeurIPS 2023.
Sep 15, 2022	One paper was accepted by NeurIPS 2022.

Selected Publications

NeurIPS
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, and 1 more author

In NeurIPS, 2023

Abs Bib PDF Code

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.
@inproceedings{PAU, author = {Li, Hao and Song, Jingkuan and Gao, Lianli and Zhu, Xiaosu and Shen, Heng Tao}, title = {Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval}, booktitle = {NeurIPS}, year = {2023}, }
NeurIPS
A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, and 2 more authors

In NeurIPS, 2022

Abs Bib HTML PDF Supp Code

Cross-modal retrieval aims to build correspondence between multiple modalities by learning a common representation space. Typically, an image can match multiple texts semantically and vice versa, which significantly increases the difficulty of this task. To address this problem, probabilistic embedding is proposed to quantify these many-to-many relationships. However, existing datasets (e.g., MS-COCO) and metrics (e.g., Recall@K) cannot fully represent these diversity correspondences due to non-exhaustive annotations. Based on this observation, we utilize semantic correlation computed by CIDEr to find the potential correspondences. Then we present an effective metric, named Average Semantic Precision (ASP), which can measure the ranking precision of semantic correlation for retrieval sets. Additionally, we introduce a novel and concise objective, coined Differentiable ASP Approximation (DAA). Concretely, DAA can optimize ASP directly by making the ranking function of ASP differentiable through a sigmoid function. To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval. The results show that our approach obtains superior performance over the state-of-the-art approaches on all metrics.
@inproceedings{DAA, author = {Li, Hao and Song, Jingkuan and Gao, Lianli and Zeng, Pengpeng and Zhang, Haonan and Li, Gongfu}, title = {A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval}, booktitle = {NeurIPS}, volume = {35}, pages = {11934--11946}, year = {2022}, }
NAACL
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Hao Li*, Chenghao Yang*, An Zhang, Yang Deng, and 2 more authors

In NAACL, 2025

Abs Bib PDF Code

Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses. Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. For the event memory module, long and short-term memory banks are employed to separately focus on historical and ongoing sessions, while a topic-based retrieval mechanism is introduced to enhance the accuracy of memory retrieval. Furthermore, the persona module conducts dynamic persona modeling for both users and agents. The integration of retrieved memories and extracted personas is subsequently fed into the generator to induce appropriate responses. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated across various illustrative benchmarks, models, and tasks. The code is released at https://github.com/leolee99/LD-Agent.
@inproceedings{LD-Agent, author = {Li*, Hao and Yang*, Chenghao and Zhang, An and Deng, Yang and Wang, Xiang and Chua, Tat-seng}, title = {Hello Again! LLM-powered Personalized Agent for Long-term Dialogue}, booktitle = {NAACL}, year = {2025}, }
AAAI
Negative Pre-aware for Noisy Cross-modal Matching

Xu Zhang*, Hao Li*, and Mang Ye

In AAAI, 2024

Abs Bib PDF Code

Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. Due to the cumulative and unavoidable negative impact of unresolved noise, existing methods cannot maintain a stable performance when the noise increases. In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tuning on noisy downstream tasks. It is featured in two aspects: (1) For noise recognition and resistance, previous methods usually directly filter out a noise subset, we propose to estimate the negative impact of each sample. It does not need additional correction mechanisms that may predict unreliable correction results, leading to self-reinforcing error. We assign a confidence weight to each sample according to its negative impact in the training process. This adaptively adjusts the contribution of each sample to avoid noisy accumulation. (2) For maintaining stable performance with increasing noise, we utilize the memorization effect of DNNs by maintaining a memory bank. Specifically, we apply GMM to select high-confident clean samples as the memory entry, where the memory entry is used to estimate the negative impact of each sample. Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio. Compared to the correction mechanism focusing on noise samples, memory bank-based estimation is more robust, which makes the model performance stable on noisy datasets. Extensive experiments demonstrate that our method significantly improves matching accuracy and performance stability at increasing noise ratio. Our approach also surpasses the state-of-the-art methods by a large margin.
@inproceedings{NPC, author = {Zhang*, Xu and Li*, Hao and Ye, Mang}, title = {Negative Pre-aware for Noisy Cross-modal Matching}, booktitle = {AAAI}, year = {2024}, }
ACL
PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free

Hao Li*, Xiaogeng Liu*, Ning Zhang, and Chaowei Xiao

In ACL, 2025

Abs Bib HTML PDF Code

Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense – falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce NotInject, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). To mitigate this, we propose InjecGuard, a novel prompt guard model that incorporates a new training strategy, Mitigating Over-defense for Free (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8%, offering a robust and open-source solution for detecting prompt injection attacks.
@inproceedings{PIGuard, author = {Li*, Hao and Liu*, Xiaogeng and Zhang, Ning and Xiao, Chaowei}, title = {PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free}, booktitle = {ACL}, year = {2025}, }
Preprint
One-step Noisy Label Mitigation

Hao Li*, Jiayang Gu*, Jingkuan Song, An Zhang, and 1 more author

In , 2024

Abs Bib PDF Code

Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-step Anti-Noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference, a costefficient process. We empirically demonstrate the superiority of OSA, highlighting its enhanced training robustness, improved task transferability, ease of deployment, and reduced computational costs across various benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA.
@inproceedings{OSA, author = {Li*, Hao and Gu*, Jiayang and Song, Jingkuan and Zhang, An and Gao, Lianli}, title = {One-step Noisy Label Mitigation}, eprint = {2410.01944}, archiveprefix = {arXiv}, year = {2024}, }

Bio

News

Selected Publications

Visitors