Size: 2752
Comment:
|
← Revision 4 as of 2024-02-21 12:00:57 ⇥
Size: 1067
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
The Polish Round Table Corpus (PRTC) is ... | The Polish Round Table Corpus (PRTC) is a dataset documenting [[https://en.wikipedia.org/wiki/Polish_Round_Table_Agreement|negotiations]] between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the [[https://biblioteka.sejm.gov.pl/okragly_stol/|Library of Polish Sejm]], OCR-ed, manually corrected and indexed in a concordancer. |
Line 5: | Line 5: |
The first edition of the PSC was prepared in October 2011 and was co-funded by project [[http://clip.ipipan.waw.pl/CESAR|CESAR]]. The update of the resource with newer data was co-funded by [[CLARIN-PL-2|CLARIN-PL]] infrastructure. | == Access to the corpus == |
Line 7: | Line 7: |
The Sejm Corpus was recently included in the [[http://clip.ipipan.waw.pl/PPC|Polish Parliamentary Corpus]]. | The corpus is currently available for [[https://clip.ipipan.waw.pl/PRTC?action=AttachFile&do=get&target=kos_ccl.tgz|download]] and [[https://kos.nlp.ipipan.waw.pl/|online search]]. |
Line 9: | Line 9: |
== Corpus data == | == Acknowledgments == |
Line 11: | Line 11: |
The corpus contain transcripts of Sejm sessions saved in TEI P5 format compatible with the annotation used by the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The resource contains automatically created annotation of: * utterance-level segmentation, * tokenization, * lemmatization, * disambiguated morphosyntactic description, * syntactic words, * syntactic groups, * named entities. === Single package === * Interpellations and questions terms 3-8, sittings terms 1-8, 37.2 GB === Divided by term and document type === ||<style="width:0.5%;text-align:center">'''Term''' ||<style="width:3%;">'''Years''' ||<5%>'''Sittings''' ||<5%>'''Interpellations and questions'''||<style="border:0;width:20%">|| ||<:> 1 || 1991–93 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad1.tar|1.0 GB]] || || ||<:> 2 || 1993–97 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad2.tar|2.6 GB]] || || ||<:> 3 || 1997–2001 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad3.tar|2.9 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad3.tar|1.7 GB]] || ||<:> 4 || 2001–05 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad4.tar|3.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad4.tar|2.4 GB]] || ||<:> 5 || 2005–07 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad5.tar|1.4 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad5.tar|2.0 GB]] || ||<:> 6 || 2007–11 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad6.tar|3.2 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad6.tar|4.8 GB]] || ||<:> 7 || 2011–15 || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Posiedzenia-kad7.tar|2.7 GB]] || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad7.tar|8.0 GB]] || ||<:> 8 || 2015– || 0.7 GB || [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Interpelacje-kad8.tar|1.1 GB]] || == Publications == <<BibMate(key,"ogro:12:lrec",omitYears=true)>> <<BibMate(key,"ogr:2018:parlaclarin",omitYears=true)>> == Searching the corpus == Online search of the corpus (including Senate data) is available at http://sejm.nlp.ipipan.waw.pl/. |
Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19. |
The Polish Round Table Corpus / Korpus Okrągłego Stołu
The Polish Round Table Corpus (PRTC) is a dataset documenting negotiations between the authorities in communist People's Republic of Poland and a section of the opposition (Solidarity movement, led by Lech Wałęsa) held in 1989 between 6 February and 5 April. The scanned transcripts have been acquired from the Library of Polish Sejm, OCR-ed, manually corrected and indexed in a concordancer.
Access to the corpus
The corpus is currently available for download and online search.
Acknowledgments
Preparation of the transcripts was financed by the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN - Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.