Part of speech tagging

PoS Tagging is the process of marking up each word of a text with the grammatical tagging (nouns, verbs, adjectives, adverbs, etc. ). It's based on both its definition and its relationship with adjacent and related words in a phrase. It's also called word-category disambiguation or grammatical tagging.

Perform PoS Tagging

Information about Part of Speech Tagging


According to wikipedia Part-of-speech tagging also called grammatical tagging or word-category disambiguation , is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech,[1] based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

How does a PoS tagger work?

PoS taggers process the input data and categorize(tag) each of the tokens(words) with a part of speech tag. But, what does part of speech stand for? It depends on the language. In the west, we commonly find 9 possible PoS: Noun, Verb, Article, Adjective, Preposition, Pronoun, Adverb, Conjunction, Interjection.

The list varies depending on the language and tradition. For instance, a functional classification considers certain classes to be closed i.e. no new words can be included in this list, while others are open, and can therefore be expanded at any time. See the following classification for an example of a functional classification in western languages.

  • Open classes:
  • adjectives
  • adverbs
  • nouns
  • verbs
  • interjections
  • Closed classes:
  • auxiliary verbs
  • clitics
  • coverbs
  • conjunctions
  • determiners
  • particles
  • measure words
  • adpositions
  • preverbs
  • pronouns
  • contractions
  • cardinals

This list is greatly expanded for computational tasks. Categories such as nouns need to be split  behind the scenes into noun-singular, noun-plural, noun-dual… All this variation has led to the design of classifications with over 100 tags.  We have included below the list of PoS tags used in Universal Dependencies’ proposal. The other two more widely adopted typologies in the West are Penn Treebank  and EAGLES. Here you can find the list of PoS tags used in the Penn Treebank project. You can find a full description of the EAGLES tagging system here.

  • Open classes:
  • Closed classes:

This classification is used as a baseline by many NLP libraries. You can check spacy’s adaptation and expansion of Universal Dependencies and OntoNotes (partly based on the work by Penn Treebank) here.

