Author(s) :
Atsuyuki Miyai,
Jingkang Yang,
Jingyang Zhang,
Yifei Ming,
Qing Yu,
Go Irie,
Yixuan Li
Hai Li
Ziwei Liu
Kiyoharu Aizawa

This paper presents a unique and substantial challenge for Vision Language Models (VLMs), which we refer to as Unsolvable Problem Detection (UPD). The purpose of UPD is to evaluate the capability of VLMs to refrain from providing answers when confronted with unsolvable problems within the realm of Visual Question Answering (VQA) tasks. UPD comprises three distinct scenarios: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD).

In order to delve into the intricacies of the UPD concern, our comprehensive experiments reveal that the majority of VLMs, including GPT-4V and LLaVA-Next-34B, encounter difficulties with our benchmarks to varying degrees, thereby underscoring the need for considerable improvement. To tackle UPD, we delve into both training-free and training-based solutions, shedding new light on their efficacy and constraints. Our aim is that the insights gleaned from this study, in conjunction with future endeavors within the proposed UPD framework, will contribute to a more profound understanding and the creation of more practical and dependable VLMs.

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models