|Deletions are marked like this.||Additions are marked like this.|
|Line 18:||Line 18:|
|[[http://sejm.nlp.ipipan.waw.pl/static/PPC-tei.tar|The Polish Parliamentary Corpus]] (84 GB) contains:||The Polish Parliamentary Corpus contains:|
|Line 24:||Line 24:|
== Download ==
Please download [[https://legis.nlp.ipipan.waw.pl/download/PPC-nanno.tar|the unannotated TEI version]] (15 GB).
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
The Polish Parliamentary Corpus contains:
- Sejm sittings from 1919–present (including Legislative Sejm and State National Council)
- Sejm committee sittings from 1993–present
- Sejm interpellations and questions from 1997–present
- Senate sittings from 1922–1939 and 1989–present
- Senate committee sittings from 2015–present
Please download the unannotated TEI version (15 GB).
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27–29 March 2017.
Searching the corpus
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences