TikTok videos help computers learn

TikTok dances have taken the world by storm, emerging as a fun way to pass the time during the Covid-19 pandemic. But for the past year, the University of Minnesota’s Twin Cities Ph.D. Student Yasamin Jafarian used dance videos from the viral social media platform for a different purpose: as food for a computer algorithm that uses frame-by-frame data to construct realistic 3D avatars of real people.
Jafarian studies computer science, specifically the field of computer vision, which involves training artificial intelligence computers to understand visual data through images and video. She is a member of Assistant Professor Hyun-Soo Park’s lab in the Department of Computer Science and Engineering.
Jafarian’s interest lies in using machine learning and artificial intelligence to generate realistic 3D avatars that people can use in virtual reality environments. Currently, there are ways to create 3D avatars in virtual reality, but most are cartoon-like, not avatars that look exactly like the real person using them.
The entertainment industry is able to achieve this through CGI (computer-generated imagery), where a realistic avatar of an actor or actress is created for use in a movie or video game. However, film productions have a lot of resources that the average person doesn’t have. To generate these avatars, film crews often use thousands of cameras to scan a person’s body so that computers can reproduce it on screen.
“The problem with cinematic technology is that it’s not available to everyone,” says Jafarian. “It’s only accessible to those few people in the movie industry. With this research, I wanted to generate the same opportunity for the average person so that they could just use their phone’s camera and be able to create a 3D avatar of herself.
Jafarian’s goal was to design an algorithm that only required a single photo or video of a person in order to generate a realistic avatar. To do this, she needed a large data set of videos to “train” the algorithm. TikTok dance videos, which often feature just one person showing their full body length in several different poses, provided the perfect solution.
By the end of summer 2021, Jafarian had watched around 1,000 TikToks. She ended up using 340 of the videos from her dataset, each 10-15 seconds long. At a video frame rate of 30 frames per second, she had amassed over 100,000 images of people dancing.
So far, Jafarian has successfully used his algorithm to generate a 3D avatar of a person seen from the front. She published a paper on the topic, which won the Honorable Mention Award for Best Paper at the 2021 Computer Vision and Pattern Recognition Conference.
Jafarian plans to continue refining the algorithm so that it can generate a person’s full body using just one or two images. His hope is that the technology could eventually be used by real people in virtual reality.
“The most important application of this is online social presence,” says Jafarian. “We saw this need throughout these Covid times where all of our interactions were on Zoom. But it doesn’t have to be Zoom-only. We can have virtual environments, using VR glasses like Oculus for example, where we can see and interact with each other. If we can make these digital avatars realistic, it would make these interactions deeper and more interesting.
Another future application of Jafarian’s research is customizing a realistic avatar’s clothing. Imagine a situation where a user sees a t-shirt online, and they can “try it on” in a virtual environment and skip a trip to the store.
Learn more and watch videos from Jafarian’s TikTok search.

John C. Dent