• Author(s) : Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu

Radiance fields have demonstrated impressive capabilities in synthesizing lifelike 3D talking heads. However, the prevailing paradigm, which presents facial motions by directly modifying point appearance, may lead to distortions in dynamic regions due to the difficulty in fitting steep appearance changes.

To address this challenge, the researchers introduce Talking Gaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging point-based Gaussian Splatting, Talking Gaussian represents facial motions by applying smooth and continuous deformations to persistent Gaussian primitives. This approach eliminates the need to learn the difficult appearance changes required by previous methods, allowing for the synthesis of precise facial motions while maintaining highly intact facial features.

Furthermore, the researchers identify a face-mouth motion inconsistency that could affect the learning of detailed speaking motions. To resolve this conflict, they decompose the model into two branches, one for the face area and another for the inside mouth region. This decomposition simplifies the learning tasks, enabling more accurate reconstruction of mouth motion and structure.

Extensive experiments demonstrate that Talking Gaussian renders high-quality lip-synchronized talking head videos, exhibiting better facial fidelity and higher efficiency compared to previous methods. The proposed deformation-based approach paves the way for more realistic and natural talking head synthesis, with potential applications in virtual assistants, video conferencing, and multimedia entertainment.