Differences between revisions 13 and 14

The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego

The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.

Corpus format

Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:

utterance-level segmentation,
tokenization,
lemmatization,
disambiguated morphosyntactic description,
syntactic words,
syntactic groups,
named entities.

Corpus data

Sejm: Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB
Senate: Sittings terms 2-9, 5.2 GB

Licence

The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.

Publications

So far please cite the Polish Sejm Corpus paper:

List of publications

Maciej Ogrodniczuk. The Polish Sejm Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).

Publications

Ogrodniczuk M. (2018). Polish Parliamentary Corpus. In: D. Fišer, M. Eskevich, F. de Jong (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. . European Language Resources Association (ELRA). ISBN 979-10-95546-02-3.

Searching the corpus

using Poliqarp search engine
using Smyrna search engine
using ngram viewer

Contact information

Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences

-  ⇤ ← Revision 13 as of 2018-05-07 02:25:01 → 
  Size: 1843
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 14 as of 2018-05-07 06:28:55 → ⇥
  Size: 2347
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 31:
+== Publications ==

{{attachment:closed.gif}} Ogrodniczuk M. (2018). ''Polish Parliamentary Corpus''. In: D. Fišer, M. Eskevich, F. de Jong'' (eds.) Proceedings of the LREC 2018 Workshop “ParlaCLARIN: Creating and Using Parliamentary Corpora. [[http://lrec-conf.org/workshops/lrec2018/W2/summaries/11_W2.html|{{attachment:bibtex.png}}]] [[http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf|{{attachment:pdf.png}}]]. European Language Resources Association (ELRA). ISBN 979-10-95546-02-3.

Diff for "PPC"

Menu

Wiki

The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego

Corpus format

Corpus data

Licence

Publications

Publications

Searching the corpus

Contact information