An artificial intelligence algorithm used by YouTube to automatically add captions to clips was accidentally inserting profanity into children’s videos.
A system known as ASR (automatic speech transcription) has been found to map words like “corn” as porn, “beach” as “bitch” and “brave” as “rape,” Wired reports.
To better track the problem, a team at the Rochester Institute of Technology in New York, along with other researchers, selected 7,000 videos from 24 of the top children’s channels.
40% of the videos they selected had “inappropriate” words in their titles, and one percent had highly inappropriate words.
They watched children’s videos on mainstream YouTube rather than YouTube Kids, which does not use auto-transcribed captions, as studies have shown that many parents still place children before mainstream.
The team said that with higher quality language models that show a wider range of pronunciations, automatic transcription could be improved.
An artificial intelligence algorithm used by YouTube to automatically add captions to clips was accidentally inserting profanity into children’s videos. One example saw the brave turn to rape
While research to detect offensive and inappropriate content is starting to remove material, little has been done to explore “accidental content”.
This includes captions that are added to videos by artificial intelligence, designed to improve accessibility for people with hearing impairments and work without human intervention.
They found that “well-known Automatic Speech Recognition (ASR) systems can produce textual content highly inappropriate for children when transcribing YouTube Kids videos,” adding, “We call this phenomenon inappropriate content hallucination.”
“Our analyzes show that such hallucinations are far from random, and ASR systems often reproduce them with a high degree of fidelity.”
It works like speech-to-text software, listening to and transcribing the audio, and timestamping it so that it can be displayed as a signature as you speak.
However, sometimes he does not hear what is said correctly, especially if it is a strong accent, or the child speaks and pronounces incorrectly.
A system known as ASR (automatic speech transcription) has been found to map words like “corn” as porn, “beach” as “bitch” and “brave” as “rape,” Wired reports. Another saw turned to shit
The team behind the new study says this problem can be solved with language models that give a wider range of pronunciations for commonly used words.
The YouTube algorithm most likely added words like “bitch”, “bastard” and “penis” instead of more appropriate terms.
One example Wired unearthed involved popular “Rob the Robot” tutorial videos, and in one clip from 2020, the algorithm shows the character aiming to be “strong and rape like Hercules,” a different character than strong and bold.
EXAMPLES OF INAPPROPRIATE HALLUCINATION CONTENT
Hallucinations of inappropriate content is a phenomenon in which the AI automatically adds harsh words when transcribing audio.
Rape from the brave
“Monsters to be strong and rape like Hercules.”
bitch from the beach
“They have the same flame upstairs, and we also have the little towel that came with it.”
Crap from crafting
“If you have any wishes or ideas that you would like us to consider, please send us an email.”
Penis from Venus and pets
“Click on the pictures and they will take you to the video, we have a penis bed and side drawers.”
Motherfucker from Buster and the stars
In fact, if you are in trouble, then who will help you here in finding a super bastard, no doubt.
Another popular channel, Ryan’s World, included videos that were supposed to be titled “you should also buy corn” but shown as “buy porn,” Wired found.
Ryan’s World’s subscriber count increased from 32,000 in 2015 to over 30 million last year, further testifying to YouTube’s popularity.
With such a dramatic increase in the number of viewers on many different children’s YouTube channels, the network has come under increasing scrutiny.
This includes consideration of automatic moderation systems designed to flag and remove inappropriate content uploaded by users before children see it.
“While the detection of offensive or inappropriate content for a particular demographic is a well-studied problem, such research typically focuses on the detection of offensive content present at the source, rather than how inappropriate content can be (accidentally) added by a downstream AI application,” they write. the authors. .
This includes AI-generated captions, which are also used on platforms such as TikTok.
The team explained that inappropriate content may not always be present in the original source, but may come through the transcription in a phenomenon they call “inappropriate content hallucination.”
They compared the sound as they heard it and videos recorded by people on YouTube Kids with videos on regular YouTube.
Some examples of “inappropriate content hallucinations” they found included “if you like this craft, keep watching until the end so you can see related videos” becoming “if you like this crap, keep watching”.
Another example is “stretchy and sticky and now we’ve got crab and it’s green”, talking about slime, to “stretchy and sticky, now we’ve got shit and it’s green”.
YouTube spokeswoman Jessica Gibby told Wired that children under 13 should use YouTubeKids, where automatic captioning is disabled.
They are available in a standard version designed for older teens and adults to increase accessibility.
To better track the problem, a team at the Rochester Institute of Technology in New York, along with other researchers, selected 7,000 videos from 24 top children’s channels. Combo became a condom
“We are constantly working to improve automatic signatures and reduce errors,” she said in a statement to Wired.
Auto transcription services are becoming more and more popular, including being used to transcribe phone calls or even Zoom meetings for automated protocols.
These “inappropriate hallucinations” can be found on all of these services, as well as other platforms that use AI-generated subtitles.
Some platforms use profanity filters to prevent certain words from being displayed, although this can cause problems if the word is actually spoken.
“The decision to type inappropriate words for children was one of the major design challenges we faced on this project,” the authors wrote.
They watched children’s videos on mainstream YouTube rather than YouTube Kids, which does not use auto-transcribed captions, as studies have shown that many parents still place children before mainstream. Buster became a bastard
“We reviewed several existing literatures, published lexicons, and also gleaned from popular children’s entertainment content. However, we felt that much remained to be done to reconcile the notion of irrelevance and changing times.”
There was also an issue with search being able to view these automated transcriptions to improve results, especially on YouTube Kids.
YouTube Kids allows keyword search if parents allow it in the app.
Of the five highly inappropriate taboo words, including “shit”, “f**k”, “shit”, “rape” and “ass”, they found that the worst of them were “rape”, “s**t” and “f**k” – not searchable.
The team said that with higher quality language models that show a wider range of pronunciations, automatic transcription could be improved. Corn has become porn
“We also found that most of the English subtitles are disabled on the kids app. However, subtitles are enabled on regular YouTube for the same videos,” they wrote.
“It’s not clear how often kids are limited to just the YouTube Kids app while watching videos, and how often parents just let them watch kids’ content on regular YouTube.
“Our findings point to the need for greater integration between YouTube in general and YouTube for kids in order to be more vigilant about children’s safety.”
The research preprint has been published on GitHub.