Size: 2795
Comment:
|
Size: 2799
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 24: | Line 24: |
||<style="width:1%; align: center">Term ||<style="width:3%"> Years ||<style="width:20%"> Sittings ||<style="width:20%"> Interpellations and questions || | ||<style="width:1%; align: center">Term ||<style="width:3%"> Years ||<style="width:200px"> Sittings ||<style="width:200px"> Interpellations and questions || |
The Polish Sejm Corpus / Polski Korpus Sejmowy
The Polish Sejm Corpus (PSC) is a large (300M-segment) collection of documents (stenographic transcripts, interpellations and questions) of Polish Sejm sittings from 1-8 terms of office.
The first edition of the PSC was prepared in October 2011 and was co-funded by CESAR European project. The current edition is being co-funded by CLARIN-PL.
Corpus data
The corpus contain transcripts of Sejm sessions saved in TEI P5 format. The resource contains automatically created annotation of:
- utterance-level segmentation,
- tokenization,
- lemmatization,
- disambiguated morphosyntactic description,
- syntactic words,
- syntactic groups,
- named entities.
Whole corpus
Divided by term and document type
Term |
Years |
Sittings |
Interpellations and questions |
1 |
1991–93 |
|
|
2 |
1993–97 |
|
|
3 |
1997–2001 |
||
4 |
2001–05 |
||
5 |
2005–07 |
||
6 |
2007–11 |
||
7 |
2011–15 |
||
8 |
2015– |
Publications
Searching the corpus
Online search of the corpus is available at http://sejm.nlp.ipipan.waw.pl/. You can also use the Poliqarp image of the corpus: http://sejm.nlp.ipipan.waw.pl/static/PSC_poliqarp.tar.gz (826 MB).