Multiword Extractor

A multiword is a set of words that when put together generate a new unit of meaning.

Information about Multiword Extraction


ByMultiword we understand units of meaning that can be disassembled into multiple words that have lexical, semantical and pragmatical relevance in the language. Constructions such as kick the bucket or San Francisco that have acquired a new meaning when combined, and can only happen with that meaning when they are together.

How does a multi-word extractor work?

Multi-word extractors are somewhat similar to keyword extractors. In this case, the machine scans the text looking for  fixed grammatical patterns that are commonly found in multi-word constructions i.e. adjective-noun, noun-adjective, noun-preposition-noun, and performs statistical association between these fixed patterns.

This process can be done by resourcing to a corpus or dataset of reference (text mining), which is used to extract contrastive relevant data from. That is;  first, we compute the  amount of recognised, linguistically valid-expected mult-word patterns in the current text, and then we perform a query-like operation on the mined data to see if this pattern has a high statistical relevance there too that validates the construction as a multi-word pattern.

