• Author(s): Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens

The paper titled “Tactile-Augmented Radiance Field (TaRF): A Scene Representation” introduces a unique scene representation, known as a tactile-augmented radiance field (TaRF), that unifies vision and touch within a shared 3D space. This representation can be utilized to estimate the visual and tactile signals for a specific 3D position within a scene.

The TaRF of a scene is captured from a collection of photos and sparsely sampled touch probes. The approach leverages two key insights: firstly, common vision-based touch sensors are constructed on standard cameras and can thus be registered to images using methods from multi-view geometry. Secondly, regions of a scene that are visually and structurally similar share identical tactile features.These insights are used to register touch signals to a captured visual scene and to train a conditional diffusion model that generates a corresponding tactile signal when provided with an RGB-D image rendered from a neural radiance field.

To evaluate the approach, a dataset of TaRFs is collected. This dataset contains more touch samples than previous real-world datasets and provides spatially aligned visual signals for each captured touch signal. The paper demonstrates the accuracy of the cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. This paper is a significant contribution to the field of scene representation, offering a comprehensive examination of the integration of vision and touch in a shared 3D space.