Term extraction or terminology extraction is the automatic analysis of text to identify phrases that meet term criteria. Terminology extraction has applications in translation and terminology management, but also in text analytics , where it is used for topic modelling, data mining and information retrieval from unstructured text.
Access to terminology extractors and the ability to create automatic glossaries is crucial for efficient management of multilingual content.
Optimal term extraction
The best term extraction provides as clean and accurate a term list as possible, requiring little manual cleaning. Many traditional extraction methods rely mainly on the frequency of occurrence of a term in the text. This method is not optimal because it requires checking and manual cleaning of the term list. Manual cleaning can be reduced by applying linguistic criteria in combination with statistics.
We can define the performance of the extractor in terms of a minimum number of characters in the term, the number of words in the term, the frequency of occurrence in the source data, and we can limit the vocabulary according to its occurrence in the common vocabulary.
al. W. Witosa 3
20-315 Lublin
+48 81 30 70 677
info@omero.pl
Ronda Sant Antoni 46,
ent.1A 08001 Barcelona
+34 931 82 42 24
info@omero.es