ALOHa: A New Measure for Hallucination in Captioning Models
Despite the recent progress in multimodal pre-training for visual description, even the most advanced models can still generate captions that contain errors, such as hallucinating objects that are not present in a scene. The current leading metric for object hallucination, known as CHAIR, is restricted to a fixed set of MS COCO objects and their synonyms.
However, a new open-vocabulary metric, ALOHa, has been proposed. This metric takes advantage of large language models (LLMs) to measure object hallucinations. It uses an LLM to extract groundable objects from a candidate caption, compares their semantic similarity to reference objects from captions and object detections, and employs Hungarian matching to generate a final hallucination score.
It has been demonstrated that ALOHa accurately identifies 13.6% more hallucinated objects than CHAIR on HAT, a new gold-standard subset of MS COCO Captions annotated for hallucinations. Furthermore, it identifies 30.8% more on nocaps, where objects go beyond MS COCO categories. This suggests that ALOHa offers a significant improvement over existing metrics.
Related Articles
- How to Compare Two Microsoft Word Documents?
- 5 Best Email + CRM Integrations You Should Use in 2024
- The Beginner’s Guide to Sales Forecasting
- Gray Color (#808080): How to Utilize Shades and Hex Codes Effectively
- What is Leadership? – 10 Great Tips to become a better leader
- Beyond Aesthetics: Leveraging AI to Amplify User Experience in Design
- The 5 Best Marketing Apps for Business and Professionals
- Best Personal Websites Examples That Make a Statement
- The Ultimate Guide to Font Combinations
- The Ultimate Guide to Social Media Image Sizes [2021 Edition]