Blog Article

ALOHa: A New Measure for Hallucination in Captioning Models


Yuvraj Singh
By Yuvraj Singh | Last Updated on April 10th, 2024 6:57 am

Despite the recent progress in multimodal pre-training for visual description, even the most advanced models can still generate captions that contain errors, such as hallucinating objects that are not present in a scene. The current leading metric for object hallucination, known as CHAIR, is restricted to a fixed set of MS COCO objects and their synonyms.

However, a new open-vocabulary metric, ALOHa, has been proposed. This metric takes advantage of large language models (LLMs) to measure object hallucinations. It uses an LLM to extract groundable objects from a candidate caption, compares their semantic similarity to reference objects from captions and object detections, and employs Hungarian matching to generate a final hallucination score.

It has been demonstrated that ALOHa accurately identifies 13.6% more hallucinated objects than CHAIR on HAT, a new gold-standard subset of MS COCO Captions annotated for hallucinations. Furthermore, it identifies 30.8% more on nocaps, where objects go beyond MS COCO categories. This suggests that ALOHa offers a significant improvement over existing metrics.

Related Articles