Language tools and resources

In the course of our R&D activities, and as instrumental assets for the execution of our projects, we developed or are developing the following tools and resources:

LXGram
Computational grammar for deep linguistic processing of Portuguese. Developed within the DELPH-IN consortium.

LX-NER
Named Entity Recognizer.

LX-Conjugator
Fully fledged automatic verbal conjugator for Portuguese, including all forms of clitic conjugation.

LX-Lemmatizer
Fully fledged automatic verbal lemmatizer for Portuguese.

LX-Inflector
Fully fledged automatic nominal lemmatizer for Portuguese.

LX-Inflection Tables
Full coverage tables with rules and exceptions for Portuguese verbal and nominal inflection.

LX-Suite
The three tools below are available online, as the LX-Suite.

LX-Tagger
Automatic part of speech tagger for Portuguese.

LX-Tokenizer
Automatic segmenter of lexemes of Portuguese.

LX-Splitter
Automatic segmenter of paragraphs and sentences of Portuguese.

LX-Closed Classes Lexicon
Full coverage lexicon for the Portuguese POS closed classes.

Constituency Parser
LX-Parser is a freely available on-line service for constituency parsing of Portuguese sentence.

Treebank Searcher
CINTIL-Treebank Online Searcher is a freely available online service to search and view the parser and dependency tree of the CINTIL-Treebank.

MWNPT-International WordNet of Portuguese
WordNet of Portuguese with ca. 16 500 concepts and 21 000 word senses (May 2008). Developed in coperation with MultiWordnet project of ITC-Irst from Trento, Italy.

CINTIL - Corpus Internacional do Português
High quality, linguistically interpreted, accurately hand tagged 1Mtoken corpus wrt POS, inflection and NER. To be distributed soon via ELRA. Developed and maintained in cooepration with CLUL-Centro de Linguística da Universidade de Lisboa.

CINTIL Concordancer
Advanced, freely available online concordancer for the CINTIL corpus. Developed and maintained in coperation with CLUL-Centro de Linguística da Universidade de Lisboa.

CINTIL TagSet
Exhaustive set of part of speech tags for Portuguese, including coverage of transcriptions of verbal productions. This is the tagset used in the annotation of the CINTIL corpus. It is also the the tagset assumed for the operation of the tools LX-Tagger, LX-Inflector, LX-Conjugator and LX-Lemmatizer.

CINTIL Annotation Manual
The companion manual of CINTIL corpus with explicit guidelines for annotation/interpretation.

CINTIL-Treebank

The CINTIL-Treebank is a corpus of syntactic trees of constituencies, composed of sentences taken from the CINTIL-International Corpus of Portuguese.

Nexing Corpus
Corpus with the transcriptions of syllogistic reasoning protocols.