Size: 3038
Comment:
|
Size: 3108
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 14: | Line 14: |
* named entities. | * named entities. |
Line 27: | Line 27: |
* {{attachment:separator.png}}Ogrodniczuk M. (2012). '' The Polish Sejm Corpus''. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA). [[http://www.lrec-conf.org/proceedings/lrec2012/summaries/653.html|{{attachment:bibtex.png|alt text|align="bottom"}}]] [[[http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf|{{attachment:pdf.png}}]] | * {{attachment:separator.png}}Ogrodniczuk M. (2012). '' The Polish Sejm Corpus''. N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA). [[http://www.lrec-conf.org/proceedings/lrec2012/summaries/653.html|{{attachment:bibtex.png|alt text|align="bottom"}}]] [[[http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf|{{attachment:pdf.png}}]] |
Line 29: | Line 29: |
* {{attachment:separator.png}}Ogrodniczuk M. (2018). ''Polish Parliamentary Corpus''. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. European Language Resources Association (ELRA). ISBN 979-10-95546-02-3. [[http://lrec-conf.org/workshops/lrec2018/W2/summaries/11_W2.html|{{attachment:bibtex.png}}]] [[http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf|{{attachment:pdf.png}}]]. | * {{attachment:separator.png}}Ogrodniczuk M. (2018). ''Polish Parliamentary Corpus''. D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. European Language Resources Association (ELRA). ISBN 979-10-95546-02-3. [[http://lrec-conf.org/workshops/lrec2018/W2/summaries/11_W2.html|{{attachment:bibtex.png}}]] [[http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf|{{attachment:pdf.png}}]]. |
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus format
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- tokenization,
- lemmatization,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
Corpus data
Sejm: Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB
Senate: Sittings terms 2-9, 5.2 GB
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
Ogrodniczuk M. (2012). The Polish Sejm Corpus. N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, S. Piperidis (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2219–2223, European Language Resources Association (ELRA).
Ogrodniczuk M. (2018). Polish Parliamentary Corpus. D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. European Language Resources Association (ELRA). ISBN 979-10-95546-02-3.
.
Other information
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27–29 March 2017.
Searching the corpus
Contact information
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences