Locked History Actions

Diff for "PRTC"

Differences between revisions 1 and 5 (spanning 4 versions)
Revision 1 as of 2024-02-15 08:54:23
Size: 2752
Comment:
Revision 5 as of 2025-05-22 09:08:47
Size: 1268
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
The Polish Round Table Corpus (PRTC) is ... The Polish Round Table Corpus (PRTC) is a dataset documenting [[https://en.wikipedia.org/wiki/Polish_Round_Table_Agreement|negotiations]] between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the [[https://biblioteka.sejm.gov.pl/okragly_stol/|Library of Polish Sejm]], OCR-ed, manually corrected and indexed in a concordancer.
Line 5: Line 5:
The first edition of the PSC was prepared in October 2011 and was co-funded by project [[http://clip.ipipan.waw.pl/CESAR|CESAR]]. The update of the resource with newer data was co-funded by [[CLARIN-PL-2|CLARIN-PL]] infrastructure. == Access to the corpus ==
Line 7: Line 7:
The Sejm Corpus was recently included in the [[http://clip.ipipan.waw.pl/PPC|Polish Parliamentary Corpus]]. The corpus is currently available for [[https://clip.ipipan.waw.pl/PRTC?action=AttachFile&do=get&target=kos_ccl.tgz|download]] and [[https://kos.nlp.ipipan.waw.pl/|online search]].
Line 9: Line 9:
== Corpus data == == Licence ==
Line 11: Line 11:
The corpus contain transcripts of Sejm sessions saved in TEI P5 format compatible with the annotation used by the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The resource contains automatically created annotation of:
 * utterance-level segmentation,
 * tokenization,
 * lemmatization,
 * disambiguated morphosyntactic description,
 * syntactic words,
 * syntactic groups,
 * named entities.


=== Single package ===
 * Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB

=== Divided by term and document type ===

||<style="width:0.5%;text-align:center">'''Term''' ||<style="width:3%;">'''Years''' ||<5%>'''Sittings''' ||<5%>'''Interpellations and questions'''||<style="border:0;width:20%">||
||<:> 1 || 1991–93 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad1.tar|1.0 GB]] || ||
||<:> 2 || 1993–97 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad2.tar|2.6 GB]] || ||
||<:> 3 || 1997–2001 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad3.tar|2.9 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad3.tar|1.7 GB]] ||
||<:> 4 || 2001–05 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad4.tar|3.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad4.tar|2.4 GB]] ||
||<:> 5 || 2005–07 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad5.tar|1.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad5.tar|2.0 GB]] ||
||<:> 6 || 2007–11 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad6.tar|3.2 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad6.tar|4.8 GB]] ||
||<:> 7 || 2011–15 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad7.tar|2.7 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad7.tar|8.0 GB]] ||
||<:> 8 || 2015– || 0.7 GB || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad8.tar|1.1 GB]] ||
The data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Line 38: Line 15:
<<BibMate(key,"ogro:12:lrec",omitYears=true)>>
<<BibMate(key,"ogr:2018:parlaclarin",omitYears=true)>>
<<BibMate(key, "ogr:etal:24:parlaclarin", omitYears=true)>>
Line 41: Line 17:
== Searching the corpus == == Acknowledgments ==
Line 43: Line 19:
Online search of the corpus (including Senate data) is available at http://sejm.nlp.ipipan.waw.pl/. Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.

The Polish Round Table Corpus / Korpus Okrągłego Stołu

The Polish Round Table Corpus (PRTC) is a dataset documenting negotiations between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the Library of Polish Sejm, OCR-ed, manually corrected and indexed in a concordancer.

Access to the corpus

The corpus is currently available for download and online search.

Licence

The data is public domain. The corpus annotations are available on CC-BY (attribution) licence.

Publications

List of publications

Maciej Ogrodniczuk, Ryszard Tuora, and Beata Wójtowicz. Polish Round Table Corpus. In Darja Fišer, Maria Eskevich, and David Bordon, editors, Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, pages 43–47, Torino, Italy, 2024. ELRA and ICCL.

Acknowledgments

Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.