Revision 1 as of 2016-10-20 16:52:25

Clear message
Locked History Actions

PPC

The Polish Parliamentary Corpus / Polski Korpus Parlamentarny

The Polish Parliamentary Corpus (PPC) is planned to be a large collection of linguistically analysed documents from proceedings of Polish Parliament, Sejm and Senate.

The resources was collated based on the following corpora:

Format

Corpus files are made available in TEI P5 format. The resource contains automatically created annotation of:

  • utterance-level segmentation,
  • tokenization,
  • lemmatization,
  • disambiguated morphosyntactic description,
  • syntactic words,
  • syntactic groups,
  • named entities.

Data

Please use the Polish Sejm Corpus data until newer version is made available.

Whole corpus

Divided by term and document type

Publications

So far please cite the Polish Sejm Corpus paper:

List of publications

Maciej Ogrodniczuk. The Polish Sejm Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).

Searching the corpus

Online search of the corpus will be available soon.