Language tools and resources
In the course of our R&D activities, and as instrumental assets for the execution of our projects, we developed or are developing the following tools and resources:
LXGram
Computational grammar for deep linguistic processing of Portuguese. Developed within the DELPH-IN consortium.
LX-NER
Named Entity Recognizer.
LX-Conjugator
Fully fledged automatic verbal conjugator for Portuguese, including all forms of
clitic conjugation.
LX-Lemmatizer
Fully fledged automatic verbal lemmatizer for Portuguese.
LX-Inflector
Fully fledged automatic nominal lemmatizer for Portuguese.
LX-Inflection Tables
Full coverage tables with rules and exceptions for Portuguese verbal
and nominal inflection.
LX-Suite
The three tools below are available online, as the LX-Suite.
LX-Tagger
Automatic part of speech tagger for Portuguese.
LX-Tokenizer
Automatic segmenter of lexemes of Portuguese.
LX-Splitter
Automatic segmenter of paragraphs and sentences of Portuguese.
LX-Closed Classes Lexicon
Full coverage lexicon for the Portuguese POS closed classes.
Constituency Parser
LX-Parser is a freely available on-line service for constituency parsing of Portuguese sentence.
Treebank Searcher
CINTIL-Treebank Online Searcher is a freely available online service to search and view the parser and dependency tree of the CINTIL-Treebank.
MWNPT-International WordNet of Portuguese
WordNet of Portuguese with ca. 16 500 concepts and 21 000 word senses (May 2008). Developed
in coperation with MultiWordnet project of ITC-Irst from Trento, Italy.
CINTIL - Corpus Internacional do Português
High quality, linguistically interpreted, accurately hand tagged 1Mtoken corpus
wrt POS, inflection and NER. To be distributed soon via ELRA. Developed and
maintained in cooepration with CLUL-Centro de Linguística da Universidade de Lisboa.
CINTIL Concordancer
Advanced, freely available online concordancer for the CINTIL corpus. Developed and
maintained in coperation with CLUL-Centro de Linguística da Universidade de Lisboa.
CINTIL TagSet
Exhaustive set of part of speech tags for Portuguese, including
coverage of transcriptions of verbal productions. This is the tagset
used in the annotation of the CINTIL corpus. It is also the the tagset
assumed for the operation of the tools LX-Tagger, LX-Inflector, LX-Conjugator and LX-Lemmatizer.
CINTIL Annotation Manual
The companion manual of CINTIL corpus with explicit guidelines for annotation/interpretation.
CINTIL-Treebank
The CINTIL-Treebank is a corpus of syntactic trees of constituencies, composed of sentences taken from the CINTIL-International Corpus of Portuguese.
Nexing Corpus
Corpus with the transcriptions of syllogistic reasoning protocols.