• Author(s): Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

The paper presents LaserMix++, an advanced framework designed to enhance the efficiency of data utilization in 3D scene understanding for autonomous driving. This is achieved by extending semi-supervised learning for LiDAR semantic segmentation, which leverages the inherent spatial priors of driving scenes and multi-sensor complements to increase the effectiveness of unlabeled datasets.

LaserMix++ integrates laser beam manipulations from various LiDAR scans and includes LiDAR-camera correspondences to aid data-efficient learning. The framework is specifically designed to enhance 3D scene consistency regularization by incorporating multi-modality. This includes a multi-modal LaserMix operation for detailed cross-sensor interactions, camera-to-LiDAR feature distillation that enhances LiDAR feature learning, and language-driven knowledge guidance that generates auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ allows for applications across different LiDAR representations, making it a universally applicable solution. The framework has been rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets.

The results show that LaserMix++ significantly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This significant advancement highlights the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.