More human than human

Epic’s new motion-capture animation tech has to be seen to be believed

"MetaHuman Animator" goes from iPhone video to high-fidelity 3D movement in minutes.

Kyle Orland – Mar 23, 2023 12:33 PM | 91

Would you believe that creating this performance took only minutes of video processing and no human tweaking? Credit: Ninja Theory / Epic

SAN FRANCISCO—Every year at the Game Developers Conference, a handful of competing companies show off their latest motion-capture technology, which transforms human performances into 3D animations that can be used on in-game models. Usually, these technical demonstrations involve a lot of specialized hardware for the performance capture and a good deal of computer processing and manual artist tweaking to get the resulting data into a game-ready state.

Epic's upcoming MetaHuman facial animation tool looks set to revolutionize that kind of labor- and time-intensive workflow. In an impressive demonstration at Wednesday's State of Unreal stage presentation, Epic showed off the new machine-learning-powered system, which needed just a few minutes to generate impressively real, uncanny-valley-leaping facial animation from a simple head-on video taken on an iPhone.

The potential to get quick, high-end results from that kind of basic input "has literally changed how [testers] work or the kind of work they can take on," Epic VP of Digital Humans Technology Vladimir Mastilovic said in a panel discussion Wednesday afternoon.

A stunning demo

The new automatic animation technology builds on Epic's MetaHuman modeling tool, which launched in 2021 as a way to manually create highly detailed human models in Unreal Engine 5. Since that launch, over 1 million users have created millions of MetaHumans, Epic said, some from just a few minutes of processing on three photos of a human face.

Ars Video

The main problem with these MetaHumans, as Mastilovic put it on stage Wednesday morning, is that "animating them still wasn't easy." Even skilled studios would often need to use a detailed "4D capture" from specialized hardware and "weeks or months of processing time" and human tweaking to get game-usable animation, he said.

Watch Melina Juergens' performance transformed into a stunningly accurate 3D animation in just minutes.

MetaHuman Animator is designed to vastly streamline that process. To demonstrate that, Epic relied on Ninja Theory Performance Artist Melina Juergens, known for her role as Senua in 2017's Hellblade: Senua's Sacrifice.

Juergens' 15-second on-stage performance was captured on a stock iPhone mounted on a tripod in front of her. The resulting video of that performance was then processed on a high-end AMD machine in less than a minute, creating a 3D animation that was practically indistinguishable from the original video.

The speed and fidelity of the result drew a huge round of applause from the developers gathered at the Yerba Buena Center for the Arts and really needs to be seen to be believed. Tiny touches in Juergens' performance—from bared teeth to minuscule mouth quivers to sideways glances—are all incorporated into the animation in a way that makes it almost indistinguishable from the original video. Even realistic tongue movements are extrapolated from the captured audio, using an "audio to tongue" algorithm that "is what it sounds like," Mastilovic said.

What's more, Epic also showed how all those facial tics could be applied not just to Juergens' own MetaHuman model but to any model built on the same MetaHuman standard. Seeing Juergens' motions and words coming from the mouth of a highly stylized cartoon character, just minutes after she performed them, was striking, to say the least.

The human performance in this trailer "hasn't been polished or edited in any way and took a MetaHuman animator just minutes to process, start to finish."

The presentation finished with the debut of a performance-focused trailer for the upcoming Senua's Saga: Hellblade II. That trailer is made all the more impressive by Mastilovic saying that Juergens' full-body motion-captured performance in it "hasn't been polished or edited in any way and took a MetaHuman animator just minutes to process, start to finish."

The machines are learning

At a panel later in the day, Mastilovic discussed how the MetaHuman Animator is powered in part by "a large, varied, highly curated database" of detailed facial-capture data that Epic has gathered over the years (with the help of acquisitions like 3Lateral, Cubic Motion, and Hyprsense). That wide array of curated faces is then processed into what it calls an "n-dimensional human space"—essentially a massive database of all the ways different parts of different head morphologies can move and stretch.

First, Epic's own "facial solver and landmark detector" identifies the key "rigging" points on the video-captured face. Then, using those points, a machine-learning algorithm essentially maps each video frame to its nearest neighbor in that massive n-dimensional database of captured faces. The algorithm uses a "semantic space solution" that Mastilovic said guarantees the resulting animation "will always work the same in any face logic... it just doesn't break when you move it onto something else."

Epic shows how the results of its MetaHuman Animator algorithm can be tweaked, if necessary. Credit: Epic

Unlike some other machine-learning models, Mastilovic said MetaHuman Animator "doesn't hallucinate any details." The focus is on generating animation that exactly matches the performance that's put into it. And Mastilovic said the model is pretty resilient to low light and background distractions in the shot.

Tools to generate usable motion-capture animation data in a short timescale aren't entirely new—Epic showed off something similar in 2016, using a real-time performance of an early Hellblade cut scene. Back then, though, the performance required Juergens to be outfitted with special makeup and tracking dots and to wear a head-mounted lighting and camera rig. And the resulting "real-time" model, while impressive for the time, was more suited for a quick "pre-visualization" rather than final, cut-scene-ready rendering performance data.

Epic's new MetaHuman Animator, on the other hand, looks like it could be used even by small developers to create highly convincing 3D animation without the usual time and labor investment. Mastilovic said he hopes the tool's wider launch this summer leads to a "democratization of complex character technologies" by "allowing you to work faster and see the results immediately."

Based on the demo we saw at GDC this week, we'd say that goal seems well within reach.

Listing image: Ninja Theory / Epic

Kyle Orland Senior Gaming Editor

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

91 Comments

Staff Picks

SplatMan_DK

For me, the uncanny valley effect is still there, like how the skin around the eyes looks flat. It's good enough for a game but it might need a few more iterations before it's Hollywood-level. You should get worried then.

Give it a year. If the overall approach is solid (as it seems) then increasing the size of the IA training set to include more data points is going to be very easy.

Doing the first ML/AI model is really hard. Expanding it from there? Not so much.

March 23, 2023 at 1:42 pm

Got Nate?

Incredible stuff. I hope they eventually focus on physics modeling with similar intensity. "Real," to me, isn't just (or even primarily) how it looks. My favorite driving games are Spintires and BeamNG because they're the closest approximations I've found of physical reality. I circle back to Trespasser JP very few years just to experience a body and weaponry bounded by physics. Triple-A games where everything a character can do is a canned animation just aren't that interesting.

here's a Rivian R1T implemented in Unreal Engine using physics modeling. In a procedurally generated environment. View: https://youtu.be/-lkEOEEKYD0

March 23, 2023 at 2:53 pm

Edgar Allan Esquire

This is cool, but I think it'll have more use for more cartoony less realistic faces. There's still all the asset props, texturing, and lighting that a small studio won't necessarily want to commit to going photorealistic with and disparate asset art styles isn't a great look. The biggest benefit might be for face mocap from home given the low tech hurdle that allows for small geographically distributed teams to pool actors globally without travel restraints.

March 23, 2023 at 3:15 pm

geof2001

I can see this being amazing for modders or even game customization so you can load up your own face models of people you want to have as the main characters in your game like friends or your family and having the story resemble something close enough to make it almost realistic for the users as if your really in the game interacting with people you know.

March 23, 2023 at 4:06 pm

jtwrenn

The coolest thing that comes to mind for this besides time savings is character creators now mapping easily and seemlessly across mocap actors. You can make the main character have great motion but make them look however you want without a ton of weirdness creeping in. Great step forward that will hopefully take some of the guess work out of the pipeline by speeding up reviews. Cool stuff all around.

Unreal has been pushing things forward for so damn long I feel like sometimes we take for granted how much they have come up with over the years. I still remember the first time I saw Unreal in 98 and how big a jump it was in graphics and they have just kept doing it over and over again.

March 23, 2023 at 6:51 pm