Size: 2232
Comment:
|
Size: 2073
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 18: | Line 18: |
* [[http://sejm.nlp.ipipan.waw.pl/static/PSC-all.tar|Sejm: Interpellations and questions terms 3-8, sittings terms 1-8]], 38.3 GB * [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Senat.tar|Senate: Sittings terms 2-9]], 5.6 GB * [[http://sejm.nlp.ipipan.waw.pl/static/PPC-historyczne.tar|Historical data]], 6.7 GB * [[http://sejm.nlp.ipipan.waw.pl/static/PPC-Komisje.tar|Sejm and Senate: Committee sittings]], 34 GB |
[[http://sejm.nlp.ipipan.waw.pl/static/PPC-all.tar|The Polish Parliamentary Corpus]] () contains: * Sejm: Interpellations and questions terms 3-8, sittings terms 1-8 * Senate: Sittings terms 2-9 * Historical data * Sejm and Senate: Committee sittings |
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus format
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- tokenization,
- lemmatization,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
Corpus data
The Polish Parliamentary Corpus () contains:
- Sejm: Interpellations and questions terms 3-8, sittings terms 1-8
- Senate: Sittings terms 2-9
- Historical data
- Sejm and Senate: Committee sittings
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
Other information
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27–29 March 2017.
Searching the corpus
Contact information
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences