The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus prepared in October 2011 and co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
Senate: Sittings terms 2-9, 5.2 GB
So far please cite the Polish Sejm Corpus paper:
Searching the corpus
Online search of the corpus using Poliqarp search engine is currently available separately for Sejm and Senate data at http://sejm.nlp.ipipan.waw.pl/.
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences