• Author(s) : Alberto Hojel, Yutong Bai, Amir Globerson, Amir Bar

Visual Prompting, a technique that enables models to learn and perform visual tasks through in-context examples without requiring additional training, has gained attention recently. In this paper, we build upon this concept and make a significant leap forward. By analyzing the motivations of MAE-VQGAN, a state-of-the-art Visual Prompting model, we uncover task vectors: unique activations that encode task-specific information.

This discovery inspires us to develop a novel method that directs the network to perform different tasks effectively. Instead of relying on input-output examples, we compute average intermediate activations per task and utilize the REINFORCE algorithm to identify the most relevant task vectors. Our approach allows the model to generalize and adapt to new tasks more efficiently.

Through extensive experiments, we demonstrate that our method improves the model’s performance across various visual tasks. By eliminating the need for input-output examples, our work simplifies and accelerates the process of adapting pre-trained models to new tasks. This advancement in Visual Prompting offers a more flexible and efficient way to leverage the power of pre-trained models in a wide range of visual applications.

Our contributions include a deeper understanding of how Visual Prompting models work, as well as a practical method that enhances their adaptability and performance. We believe this work opens up exciting possibilities for future research and applications in the field of computer vision and beyond.