ENCO Blog: Expanding Capabilities of Closed Captioning Through ASR

The art and science of automated real-time captioning for television has evolved considerably in the last several years, with continually increasing accuracy thanks in-part to Artificial Intelligence-based learning (where the speech recognition “learns” via human feedback), faster processing and more innovative approaches to embracing the broader captioning workflow.

Even before adding additional guidance from the creation of unique word lists, the finest automated captioning systems are delivering accuracy percentages in the mid- to high-90s today. Additional accuracy comes from ingesting unique words, people’s names, regional words and spellings that help to dial-in the AI’s accuracy (“does she spell her name as Kris or Chris?”).

IMMEDIATE NEEDS

Totally on-premises captioning systems are generally the “Best in Class” option, and you can't find a much faster way to caption. That’s because where most human captioners are remotely located, often working out of their homes and only listening to an audio feed via phone, an on-premises solution seriously shortens the complexity of patching-in and scheduling a remote human captioner—and if breaking news means you have to caption immediately, just flip the switch and you're on.

The speed of automatic speech recognition (ASR) has also improved a lot, and is faster than humans captioners today. The smarter AI-based systems, (such as ENCO’s enCaption product), know the words almost immediately, but pause a second or two to gain further accuracy via on-the-fly statistical analysis, predicting what words generally work together and which don't. That, coupled with being on-premise, makes for quick captioning (if your workflow requires cloud processing, there are many solutions for that as well).

Captioning workflows are now far more than just embedding CEA-608/708 data into your transmission signal, too.

THE ‘THIRD RAIL’

Captioning is the “third rail” of your transmission media—you have video, audio and you have captions. While traditional captions have just included the dialog, today we see more value-added captions, such as the name of a song that’s playing or sound effects in the program. This means as a broadcaster, you have more ways to address your viewers than just video and sound.

Though there’s an estimated 48 million Americans who report being hard of hearing or deaf, captions serve an even larger audience than that. Ever been in a restaurant with lots of TV’s on a weekend—how many different sports games are on at the same time? And yet, the PA system is likely playing only one play-by-play. Thanks to captions, you can still gain insights into any of the other games being shown by reading their captions. Then there’s the opposite scenario, where your viewers are watching from a quiet location where they don't want to disturb anyone near them—captions make it possible for them to enjoy your content there too (you do caption everything, right?).

When you caption everything, interesting possibilities start to emerge.

With more television broadcasters using automated video logging and compliance software, they may already have a powerful tool there to make captions even more useful. Many of these types of products allow for the logging of all the caption data—aligned with the video—such that you can jump to an index point just by just clicking on the caption.

It then quickly becomes an easy task to search across all the feeds you’re recording, making it instantly possible to find the conversation, brand or person’s name you're looking for, with “point and click” precision. Moreover, this capability doesn't just have to be available to your producers, as you can easily bring such access online to your website and your viewers.

CAPTION EVERYTHING

Nowadays you can search YouTube videos with the results limited to only videos that are captioned—if you have any video that’s not captioned, it won't even show up in the results. Automated captioning means the more you caption (both live and offline files), the more economical ASR products become. Which means—caption everything.

Since it costs less using smart AI automation, why not use it to caption more of your audio feeds? How about captioning the audio associated with ISO camera feeds you may have at sporting or other special events? You don't always have enough resources to monitor all the audio or video coming in from these ISO feeds, but by using an ASR to caption automatically, you can set up software to alert you when certain keywords are said on these audio feeds—helping improve your real-time access to more content faster and automatically (whether producing a live broadcast or in post production).

Be sure to consider what you're captioning online, as well. While the FCC regulates only some captioning for online streaming and OTT/VOD applications, the Americans with Disabilities Act may impact other aspects, and there are many organizations committed to growing accessibility standards across all media (some who are not afraid to bring lawsuits in the process).

Therefore, if you’ve got an on-premises captioning system that’s not being used for real-time all the time, why not use it in file-based mode and caption the thousands of hours of legacy video you already have? Just like with live video, this can make your content more accessible to more people (and search engines), is good for business and it gives your content that “third rail.”

There’s a lot of innovation taking place in the captioning industry, bringing more value to more people, and in the process, giving you more ways to keep in touch with your audience.

Published

April 24, 2020