TimeBankPT is a corpus of Portuguese text with annotations about time.
The annotation scheme used is similar to TimeML.
TimeBankPT is the result of adapting the English corpus used in the first TempEval challenge to the Portuguese language.
The preferred citation is Costa and Branco (2012).
Further details about the corpus can be found in the following publications:
- Costa, Francisco and Branco, António. 2010. Temporal Information Processing
of a New Language: Fast Porting with Minimal Resources. In
ACL2010-Proceedings of the 48th Annual Meeting of the Association for
[ bib ]
- Costa, Francisco and Branco, António. 2012. TimeBankPT: A
TimeML Annotated Corpus of Portuguese. In Proceedings of LREC2012.
[ bib ]
- Costa, Francisco. to appear. Processing Temporal Information in
Unstructured Documents. Ph.D.thesis, Universidade de Lisboa, Lisbon.
[ bib ]
Some of the features of TimeBankPT:
- It uses the new Portuguese spelling (official document describing it, Wikipedia article).
- It was automatically checked for errors using reasoning code.
- It contains around 70,000 words of text, divided in a train set and a test set.
- It contains annotations for events, temporal expressions and temporal relations.
Size of TimeBankPT
|Sentences ||2,281 ||351|
|Word Tokens |
|According to white space ||60,782||8,920|
|Splitting contractions and detaching punctuation ||68,351||9,829|
|Events ||6,790 ||1,097|
|Temporal Expressions ||1,244 ||165|
|Temporal Relations ||5,781 ||758|
This short text from TimeBankPT is an example of what can be found in TimeBankPT.
Version 1 of TimeBankPT is available for download.