• Author(s): Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

The paper introduces MVSGaussian, a novel approach for 3D Gaussian representation derived from Multi-View Stereo (MVS) that efficiently reconstructs unseen scenes. This method is designed to enhance the performance of 3D scene reconstruction and view synthesis through several key innovations.

Firstly, MVSGaussian utilizes MVS to encode geometry-aware Gaussian representations, which are then decoded into Gaussian parameters. This encoding process ensures that the geometric details of the scenes are accurately captured and represented. Secondly, the approach incorporates a hybrid Gaussian rendering technique that combines efficient volume rendering with novel view synthesis. This integration allows for high-quality rendering of new views with improved efficiency.

To facilitate rapid fine-tuning for specific scenes, MVSGaussian introduces a multi-view geometric consistent aggregation strategy. This strategy effectively aggregates point clouds generated by the generalizable model, providing a robust initialization for per-scene optimization. This feature significantly reduces the time required for fine-tuning, enabling real-time rendering with superior synthesis quality.

Compared to previous generalizable NeRF-based methods, which typically require minutes for fine-tuning and seconds for rendering each image, MVSGaussian achieves real-time rendering with enhanced synthesis quality. Additionally, it outperforms the vanilla 3D-GS in terms of view synthesis quality while requiring less computational cost for training.
Extensive experiments conducted on various datasets, including DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples, demonstrate that MVSGaussian achieves state-of-the-art performance. The results validate its generalizability, real-time rendering speed, and fast per-scene optimization capabilities.

In summary, MVSGaussian represents a significant advancement in 3D scene reconstruction and view synthesis. Its innovative use of MVS for encoding Gaussian representations, combined with hybrid Gaussian rendering and efficient fine-tuning strategies, sets a new benchmark for performance and efficiency in this field.