Feature focus: Auto-bilingual term extraction

Grace Cowan

updated at April 18, 2025

Feature focus: Auto-bilingual term extraction illustration

Well-curated multilingual terminology glossaries are essential for consistent brand messaging across a company’s content ecosystem. Creating terminology lists from parallel sets of source and translated files enables key terms to be added to a company’s glossary.

When organizations look to extract terminology from previous translations, they could be dealing with thousands of translated file pairs. The challenge is how to align and extract terminology in an efficient and automatic way, while meeting a sufficiently high quality bar. To achieve this, a fully automated approach which makes smart choices is required.

In this article we will shed light on the auto-bilingual term extraction functionality in XTM Cloud, what it does, and how it makes the terminologist’s task easier. The net impact of this functionality is improved productivity, increased translation quality across your target languages and faster time-to-market.

Upping the terminology game with Inter-language Vector Space

With the release of XTM Cloud 12.4, users are able to run term extraction on parallel source and target texts during the alignment process. Thanks to the advances in computational linguistics, Big Data and AI, very little human intervention is required to build bilingual glossaries. Supported by Inter-language Vector Space, this functionality enables terminologists to create termbases with fewer headaches, greater accuracy and quality. The new framework allows for algorithm-driven automation to work out the probability of a target word being the equivalent of the source word.

Andrzej Zydroń, CTO said, “Identifying bilingual term candidates manually is a very labour-intensive task for linguists if done manually. Inter-language Vector Space, the advanced AI technology that underpins bilingual term extraction can automate up to 90% of this important task by automatically identifying equivalent target language terms. This way, linguists can create termbases in a much more efficient way while focusing on what they do best – translating.”

Manual vs. automated method – it’s up to you

Imagine the following scenario: as a project manager you get a 20,000 + words translation project with a highly specialized content from one of your clients. You’d like to use the terminology from previously translated texts to enhance the translation process for the current project. There are two ways to create a glossary from existing translations.

A linguist goes through the source and target texts to identify term candidates. One of the criteria to determine a term is frequency or a degree of specialization – if a term appears frequently in the text or is not a common term and requires research, then it should be documented and added to the glossary of terms. It’s a time-consuming and somewhat tedious process; which slows down the initial phase of projects.

Bilingual term extraction in XTM Cloud — Bilingual term extraction view in XTM Cloud

A second way is the terminologist runs the bilingual term extraction function in a translation management system. They can mark this option in XTM Cloud during an alignment process as shown in the image below. Behind-the-scenes parallel source and target texts are aligned at the sentence level with source nouns and noun phrases identified and sorted by frequency. Next, and the challenging part, is to single out equivalent target language nouns and noun phrases including various forms of occurrence and corresponding context.

The output of the extraction is exported to a Microsoft Excel sheet, as presented below.

For a more detailed instruction on how to extract bilingual terms, go to the XTM knowledge base.

Sara Basile, Product Manager at XTM International noted, “The XTM AI team has developed a new technology to take a mundane and tedious process away from the terminologist. The bilingual term extraction performed during the alignment of the parallel source and target texts produces an Excel spreadsheet with all the data required to review and add terminology. One implication of this is that XTM users will see 80% productivity improvement over manual methods.”

When you might want to use bilingual term extraction

The following use cases are typical for bilingual term extraction:

Consistent terminology is critical when producing technical and marketing content. You can use existing bilingual corpora to quickly and efficiently build your terminology prior to project launch.

Older termbases may be of questionable quality. Manually inspecting old entries in a large and outdated database would require an immense effort. With bilingual term extraction, you have the option to build up a brand new termbase from scratch based on your already available Translation Memories (TMs) or select source and target files.

You want to quickly add terms in batches to your database at regular intervals. Instead of having linguists spend time adding source and target terms segment by segment, and project by project, you can now run a bilingual term extraction every few months on all the projects that were done in XTM Cloud and add terminology in one batch. This is much more time-efficient and controlled compared to individual linguists adding small increments of terminology in every other project.

You want to check the quality and consistency of existing TM data. By showing the different target-language term proposals and their context sentences, bilingual term extraction will highlight inconsistencies in the target terms so that you can assess the overall quality of the TM.

90% accuracy across 50 languages, productivity improvement up to 80%

With auto-bilingual term extraction the output is produced for 50 languages, accuracy is above 90% while time savings amount to 80%. XTM AI bilingual term extraction is a breakthrough technology which speeds up project turnarounds and ensures high quality consistent communication across channels.

Would you like to find out more about Inter-language Vector Space and how it ensures a seamless process of glossary creation? Head over to XTM TechTalk with Andrzej Zydroń, CTO and Rafał Jaworski, Linguistic AI Expert.