Smart Automatic Speech Recognition

Our groundbreaking live automated captioning solution

Ai-Media’s Smart LexiTM uses the power of  our expert team, human-curated custom dictionaries and Automatic Speech Recognition (ASR) to deliver results significantly better than standard automated captioning products in the market, at half the typical price of premium captions.

Meet Smart Lexi, our groundbreaking live automated captioning solution.

image of Tony Abrahams
Tony Abrahams
Co-Founder, Director and CEO
Play Video
“We combined our automated technologies with the knowledge and skill of our expert captioning team to create the best ASR solution out there.”

What is Smart Lexi

Layering our human curation onto Automatic Speech Recognition (ASR) engines, Smart Lexi hits a sweet spot in terms of accuracy and the price point.

Human Curation

Performed by a team with over a decade of expertise in training language software to produce accurate live captions, curating and crafting preparation materials, delivering live broadcast captions, and implementing captioning technology and infrastructure on a global scale.

For each session, our expert captioning team conducts in-depth research using our specialized in-house database and customer-provided documentation. They use this to compile names, terms, phrases, spellings and pronunciations tailored to the needs of the session, and feed them into our comprehensive custom dictionaries and custom captioning filter.

Custom Dictionaries

Models have been refined using our more than 10 years of human expertise and data. Our custom dictionaries – overseen by our highly skilled team – teach the ASR engine key names and phrases tailored to every captioning session and its particular subject matter, as well as phonetic pronunciations. 

They refine accuracy for particularly challenging terms and apply customer-specific formatting, standards and any censorship requirements to create the best ASR captions possible.

Not all Automatic Speech Recognition (ASR) is created equal.

“Our solution utilizes human-curated custom dictionaries and custom caption filtering, adding a layer of refinement to the raw ASR output resulting in greater accuracy”

– Ai-Media, Product Team
How does Smart Lexi™ compare on Accuracy?

The accuracy of live captions varies greatly. There are several options in the market that can deliver according to the needs of the consumer.

Out-of-the-box ASR:

Low Accuracy

Out-of-the-box ASR – which includes the free captions available on Zoom, YouTube and Google – has no human input and, as a result, the lowest accuracy. It is best-suited to casual meetings where accuracy is not an important consideration.

Industry leader in ASR: Lexi

EEG’s Lexi product:

Medium Accuracy

EEG’s Lexi product currently tops the industry in ASR live captioning solutions. It is better quality than what out-of-the-box captions can offer, and is suited to live streaming and live broadcast situations where some level of accuracy is needed, but errors are acceptable.

Ai-Media’s Smart Lexi™:

High Accuracy

By laying Ai-Media’s technology on existing ASR products, Smart Lexi delivers a significant improvement over the performance of standard ASR products in the market.

It achieves accuracy outcomes approximately halfway between generic out-of-the-box ASR and Ai-Media’s premium service – representing a ground-breaking development in the industry.

Ai-Media Premium Live Captions

Highest Accuracy

For those who need the highest quality captioning available, Ai-Media’s premium, human captioning service remains the top choice. This service features high-quality live captions generated by Ai-Media’s skilled and experienced human captioners. 

It’s the best choice for content with multiple speakers and accents, and environments with poorer audio quality.

How We Measure Accuracy:

The NER Score

The NER system is the best way yet devised to measure the accuracy of live captioning. It is repeatable and usable across various industries and services, and is viewer-centric – meaning it decides how bad an error is by the impact that the error has on a viewer’s understanding of the program.

NER measurements are part of captioning regulations in countries including Australia and Canada.

The name ‘NER model’ comes from the equation the model uses to produce a quality score.

Score = (N-E-R)/N


N is the total number of words and punctuation in the captioned piece

E is the sum of Edition errors – where a word or words have been spoken but do not appear in the captions (or sometimes where words have been added to the captions but have not been spoken).

R is the sum of Recognition errors – A Recognition error being where an incorrect word or words appear in the captions

Let us help you find the right solution for your needs.

Each of our services is unique and has specific applications. Get in touch and our team will gladly assist you in choosing the most appropriate service for your needs.

Smart Lexi is designed to meet a gap in the market where ASR ‘out-of-the-box’ is not high-quality enough and our premium human-captioned service is not affordable enough for the job.

Smart Lexi is particularly well suited to live single-speaker situations that use high-quality audio and a predictable dictionary of defined terms. It is ideal for scenarios with clear audio. It is the perfect answer for those who need a scalable live captioning solution at a lower price than our premium service.

Request a Quote

Looking to work for AI-Media? Apply Here

Frequently Asked Questions

What is Smart Lexi?

Ai-Media’s Smart Lexi is a groundbreaking live captioning solution that is automated and human-curated by our expert team. Our Smart Lexi represents the next generation automated live captioning, thanks to the skill and technical experience of our captioning curation team and our state-of-the-art custom dictionaries.

What is Smart Lexi best used for?

Ai-Media’s Smart Lexi is a fantastic option for those wanting live captions at a lower cost than premium human captioning and a higher accuracy than ‘out-of-the-box’ automated captions. It is perfect for live settings with a single speaker and a clear audio feed, with minimal background noise, music or overlapping dialogue. 

How accurate are Smart Lexi captions?

Ai-Media’s Smart Lexi uses human-curated custom dictionaries to add a layer of human refinement to raw ASR output, meaning better accuracy for you and your users. The accuracy of our Smart Lexi captions is significantly higher than out-of-the-box ASR. Accuracy will also vary from session to session, depending on the quality of the audio feed and the accent of the speakers

How is Ai-Media’s ASR solution different from its competitors?

The Ai-Media ASR difference comes down to our expert global team and our in-house technical development. In our team, we have decades of experience in understanding which terms and phrases are typically difficult to caption, and we have built our own end-to-end caption delivery system to make the process as smooth as possible. As a result, our system is optimized for maximum accuracy and minimal delays.

What is the price of Smart Lexi?

The price of our Smart Lexi service varies depending on the specifics of implementing the service into your infrastructure and workflow and your volume of content. As a guide, Smart Lexi captions are usually around half the cost of human-generated live captions.

How does the speed of Smart Lexi captions compare to human-generated live captions?

Smart Lexi captions have a similar or shorter time delay than human-generated live captions. The delay between the audio and the Smart Lexi captions is usually around two to four seconds, whereas human-generated captions are usually delayed by around four to seven seconds.

What are custom dictionaries and how do they work?

Custom dictionaries are databases of terms and phrases that our captioning team uses to teach an ASR engine, so it produces the words correctly when it ‘hears’ them. Our team first researches and compiles key names and phrases on the session’s subject matter. Next, they use their in-depth knowledge of speech recognition software to program phonetic pronunciations into our Smart Lexi engine. This process makes the live captions more accurate when consumers receive them to their screens.

Does Smart Lexi meet regulatory requirements for the provision of live captioning?

Regulatory guidelines for live captioning vary greatly between countries and regions. While Smart Lexi meets regulatory requirements for quality and time delay in many regions, some countries have specific requirements that Smart Lexi does not yet meet, such as strict speaker change indication and on-screen positioning. If you are unsure of your country’s requirements, please get in touch for more information.

Do we store your data?

You can read our full Terms of Service and Privacy Policy on our website. 

Will Smart Lexi impact human captioning?

Human input will always be essential for the accuracy and customization of captions. Smart Lexi would not be possible without the time and effort of  professional captioners and technical staff, who operate and refine the custom dictionaries and automation systems used to make the live captions.

Skip to content