Author(s) : Minghao Chen, Iro Laina, Andrea Vedaldi

The task of editing 3D objects and scenes based on open-ended language instructions presents a unique set of challenges. The conventional approach to address this problem involves using a 2D image generator or editor to guide the 3D editing process. However, this method often proves to be time-consuming due to the need to update computationally intensive 3D representations such as a neural radiance field. Moreover, it relies on potentially contradictory guidance from a 2D model, which is inherently not multi-view consistent.

To overcome these issues, the Direct Gaussian Editor (DGE) is introduced. This method enhances the editing process in two significant ways. Firstly, it modifies a high-quality image editor, such as InstructPix2Pix, to achieve multi-view consistency. This is accomplished by employing a training-free approach that integrates cues from the underlying 3D geometry of the scene.

Secondly, upon obtaining a multi-view consistent edited sequence of images of the object, the DGE directly and efficiently optimizes the 3D object representation. This optimization is based on 3D Gaussian Splatting, a technique that does not require incremental and iterative application of edits.

As a result, the DGE is significantly more efficient than existing approaches. It also offers additional benefits, such as enabling selective editing of parts of the scene. This research contributes to the understanding and development of efficient and effective methods for 3D object and scene editing.

View arXiv Page

View PDF

Get Code

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing