Children’s videos on YouTube contain explicit words via auto-generated captions / Digital Information World

YouTube’s famous Rob the Robot has over four hundred thousand subscribers. This YouTube page is famous for its learning content for kids. However, the artificial intelligence system behind the auto-generated captions of such content may use adult language by mishearing the correct word and replacing it with an inappropriate word that sounds like the original one.

This study, which was conducted recently, demonstrated that an AI algorithm can convert a children’s educational video into unpleasant adult language content. While conducting research on this topic, more than seven thousand videos were studied, belonging to more than 24 different children’s channels. The result showed that 40% of the content had more than thirteen hundred taboo words, while others had very inappropriate vocabulary. Common examples include replacing crab with shit and buster with bastard.

Ashique KhudaBukhsh, assistant professor at the Rochester Institute, is one of the researchers behind this discovery. Ashique shared his remarks calling the issue disturbing.

Despite the YouTube Kids option, many parents prefer the actual app. In a study conducted by the Pew Research Center, the results showed that more than 80% of parents of children under the age of eleven allow their children to watch YouTube content, while more than 50% of these children watch content. daily.

Jessica Gibby, spokesperson for the app, assured everyone that the platform is working to make subtitles a safe place for children. Confusing words with age-restricted words is not only limited to children’s content, but other content as well. A reporter once discovered that audio transcription software, Trint, misunderstood the name Negar and replaced it with the inappropriate N-word.

There are still many gaps to be filled by voice synthesis. Relying entirely on an algorithm to fill in these gaps is not a wise choice as it can create unwanted problems. Like the one encountered by a startup after discovering that the system was transcribing inappropriate sexual content involving adolescence.

Every system needs to be fed with data. In many cases, audio data is usually added by adults, while children don’t add as much. And even if they add files, the accent is not native, which is confusing.

Rachael Tatman, a linguist and co-author of this research, suggested adjusting subtitles might help reduce errors, but it wouldn’t be an easy task.

Ashique KhudaBukhsh and his team have also been working on possible solutions to this situation. They also ran sound through other software like on Amazon and found it had the same issue as well. The Amazon spokesperson allowed app developers to look into this issue for possible solutions on how to filter words accordingly with content.

Read next: YouTube explores new ways to fight misinformation

John C. Dent