Size: 2983
Comment:
|
Size: 2989
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 28: | Line 28: |
* Ogrodniczuk M. (2012). '' The Polish Sejm Corpus''. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA). [[http://lrec-conf.org/workshops/lrec2018/W2/summaries/11_W2.html|{{attachment:bibtex.png|alt text|align="bottom"}}]] [[[http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf|{{attachment:pdf.png}}]]. | * Ogrodniczuk M. (2012). '' The Polish Sejm Corpus''. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA). [[http://lrec-conf.org/workshops/lrec2018/W2/summaries/11_W2.html|{{attachment:bibtex.png|alt text|align="bottom"}}]] [[[http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf|{{attachment:pdf.png}}]].<<BR>> |
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus format
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- tokenization,
- lemmatization,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
Corpus data
Sejm: Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB
Senate: Sittings terms 2-9, 5.2 GB
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
Ogrodniczuk M. (2012). The Polish Sejm Corpus. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA).
.
Ogrodniczuk M. (2018). Polish Parliamentary Corpus. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. European Language Resources Association (ELRA). ISBN 979-10-95546-02-3.
.
Other information
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27--29 March 2017.
Searching the corpus
Contact information
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences