The main and most important words of a text are extracted and scored with the keyword extractor.

Information about Keyword Extraction


According to Wikipedia Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document.

How does a keyword extractor work?

Keyword extractos are usually based on frequency calculations.  There are several approaches.

Keyword extractors statistically calculate the weight of each word in the text and compares the frequency of a given word against the data in order  to calculate whether it is statistically relevant in the current context. This process is usually preceded by  a tokenization of the text, removal of stop words, and punctuation. Only words that belong to a linguistically more relevant part of speech (PoS) are taken into account i.e. nouns, noun phrases or verbs. Sometimes stemming techniques and word segmentation are also part of the model.

Some approaches take into account more advanced linguistic information, such as co-occurrences (fixed linguistic patterns that appear together, or that can even acquire a different meaning when appearing together) or discourse markers(phrases such as well, you know. They are used to reorganize the speech or emphasis)

Stanford CoreNLP