Locked History Actions

Diff for "PRTC"

Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2024-02-15 08:54:23
Size: 2752
Revision 4 as of 2024-02-21 12:00:57
Size: 1067
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
The Polish Round Table Corpus (PRTC) is ... The Polish Round Table Corpus (PRTC) is a dataset documenting [[https://en.wikipedia.org/wiki/Polish_Round_Table_Agreement|negotiations]] between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the [[https://biblioteka.sejm.gov.pl/okragly_stol/|Library of Polish Sejm]], OCR-ed, manually corrected and indexed in a concordancer.
Line 5: Line 5:
The first edition of the PSC was prepared in October 2011 and was co-funded by project [[http://clip.ipipan.waw.pl/CESAR|CESAR]]. The update of the resource with newer data was co-funded by [[CLARIN-PL-2|CLARIN-PL]] infrastructure. == Access to the corpus ==
Line 7: Line 7:
The Sejm Corpus was recently included in the [[http://clip.ipipan.waw.pl/PPC|Polish Parliamentary Corpus]]. The corpus is currently available for [[https://clip.ipipan.waw.pl/PRTC?action=AttachFile&do=get&target=kos_ccl.tgz|download]] and [[https://kos.nlp.ipipan.waw.pl/|online search]].
Line 9: Line 9:
== Corpus data == == Acknowledgments ==
Line 11: Line 11:
The corpus contain transcripts of Sejm sessions saved in TEI P5 format compatible with the annotation used by the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The resource contains automatically created annotation of:
 * utterance-level segmentation,
 * tokenization,
 * lemmatization,
 * disambiguated morphosyntactic description,
 * syntactic words,
 * syntactic groups,
 * named entities.

=== Single package ===
 * Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB

=== Divided by term and document type ===

||<style="width:0.5%;text-align:center">'''Term''' ||<style="width:3%;">'''Years''' ||<5%>'''Sittings''' ||<5%>'''Interpellations and questions'''||<style="border:0;width:20%">||
||<:> 1 || 1991–93 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad1.tar|1.0 GB]] || ||
||<:> 2 || 1993–97 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad2.tar|2.6 GB]] || ||
||<:> 3 || 1997–2001 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad3.tar|2.9 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad3.tar|1.7 GB]] ||
||<:> 4 || 2001–05 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad4.tar|3.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad4.tar|2.4 GB]] ||
||<:> 5 || 2005–07 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad5.tar|1.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad5.tar|2.0 GB]] ||
||<:> 6 || 2007–11 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad6.tar|3.2 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad6.tar|4.8 GB]] ||
||<:> 7 || 2011–15 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad7.tar|2.7 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad7.tar|8.0 GB]] ||
||<:> 8 || 2015– || 0.7 GB || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad8.tar|1.1 GB]] ||

== Publications ==


== Searching the corpus ==

Online search of the corpus (including Senate data) is available at http://sejm.nlp.ipipan.waw.pl/.
Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.

The Polish Round Table Corpus / Korpus Okrągłego Stołu

The Polish Round Table Corpus (PRTC) is a dataset documenting negotiations between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the Library of Polish Sejm, OCR-ed, manually corrected and indexed in a concordancer.

Access to the corpus

The corpus is currently available for download and online search.


Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.