• Author(s): Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo

The field of image matching has seen a rapid development of learnable feature matching techniques, consistently pushing the boundaries of performance on standard benchmarks. However, a closer examination reveals that despite these advancements, their applicability to real-world scenarios is hindered by limited generalization abilities when faced with novel image domains.

OmniGlue, introduced in this paper, stands as the first learnable image matcher designed with generalization as its core principle. By harnessing the extensive knowledge of a vision foundation model, OmniGlue guides the feature matching process, significantly enhancing its generalization to previously unseen domains during training. Furthermore, a novel keypoint position-guided attention mechanism is proposed, effectively separating spatial and appearance information, resulting in improved matching descriptors. Extensive experiments conducted across a range of 7 datasets, encompassing scene-level, object-centric, and aerial images, demonstrate the effectiveness of OmniGlue’s innovative components.

Compared to a directly comparable reference model, OmniGlue achieves a substantial 20.9% relative improvement on unseen domains, while also surpassing the recent LightGlue method by a relative 9.5%. These findings highlight the potential of OmniGlue as a highly generalizable, learnable image matcher, paving the way for more robust and versatile image matching applications in real-world scenarios.