PhD Proposal: Hallucinations in Multimodal Large Language Models: Evaluation, Mitigation, and Future Directions
Remote
Multimodal Large Language Models (MLLMs) have achieved impressive performance across a wide array of tasks. However, these models are prone to hallucinations that compromise their reliability. This thesis explores the phenomenon of hallucinations in MLLMs, focusing on their identification, underlying causes, and mitigation strategies.We first propose a systematic evaluation framework to quantify and analyze hallucinations across multiple modalities, leveraging diverse metrics tailored to real-world scenarios. Building on this foundation, we introduce novel mitigation strategies, combining architectural improvements, fine-tuning techniques, and data augmentation approaches to reduce hallucination rates without sacrificing model versatility. Finally, we identify open challenges and outline future research directions. This work provides a comprehensive roadmap for understanding and addressing hallucinations in MLLMs, contributing to the broader goal of enhancing the robustness and reliability of AI systems.