• Author(s) : Guoxing Sun, Rishabh Dabral, Pascal Fua, Christian Theobalt, Marc Habermann

Creating lifelike digital humans from a few simple camera shots is a tough nut to crack in the world of computer vision and graphics. Imagine trying to piece together a detailed 3D figure from just a handful of photographs—it’s like solving a puzzle with most of the pieces missing. You’re left guessing about parts you can’t see, like what’s hidden behind an arm or how far back something is. This guessing game gets even trickier with something as dynamic and complex as the human body.

Enter MetaCap, our latest invention. It’s like a smart detective that can take a few scattered clues (or camera shots) and piece together a full, moving 3D figure. The secret sauce? It learns from a bunch of videos taken from different angles, getting a feel for how humans move and look from all sides. This learning helps it make educated guesses about the parts it can’t see from just one or two photos.

But humans aren’t stiff mannequins; we move, bend, and twist in complex ways. To tackle this, MetaCap doesn’t just learn about human shapes in any random pose. It thinks about what we look like in a basic, standard pose, making it easier to understand all the possible ways we can move. Once it gets the hang of this, it can adapt to new poses, different lighting, and fresh viewpoints, even if it’s just working from a single photo.

To put MetaCap to the test, we didn’t take the easy route. We captured people in action with both high-tech camera setups and just a few cameras in natural settings, creating a challenging mix of images. The results? Our system didn’t just do well; it set new records, making it a groundbreaking tool for bringing digital humans to life from minimal snapshots.