Size: 1656
Comment:
|
← Revision 80 as of 2025-05-22 09:06:41 ⇥
Size: 16003
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= The Polish Parliamentary Corpus / Polski Korpus Parlamentarny = | = The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego = |
Line 3: | Line 3: |
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, [[http://www.sejm.gov.pl/english.html|Sejm]] and [[http://www.senat.gov.pl/en/|Senate]]. It is based on the [[PSC|Polish Sejm Corpus]] prepared in October 2011 and co-funded by project [[http://clip.ipipan.waw.pl/CESAR|CESAR]] and is currently being updated by [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] infrastructure. | The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of the Polish Parliament, [[http://www.sejm.gov.pl/english.html|Sejm]] and [[http://www.senat.gov.pl/en/|Senate]]. It is based on the [[PSC|Polish Sejm Corpus]] co-funded by project [[http://clip.ipipan.waw.pl/CESAR|CESAR]] and later extended using the support of [[http://clip.ipipan.waw.pl/CLARIN-PL|CLARIN-PL]], [[http://clip.ipipan.waw.pl/MARCELL|MARCELL]] and [[http://zil.ipipan.waw.pl/ParlaMint|ParlaMint]] projects. == Corpus data == The current size of the corpus (as of 22 May 2025) amounts to over 870M segments with detailed distribution over houses, periods, and document types presented below. Apart from the stenographic records of plenary sittings (271M segments) and committee sittings (330M segments), the corpus contains 199M segments of interpellations and questions. ||<tablewidth="100%"> ||<-7> '''Sejm''' || ||<-6> '''Senate''' || || || ||<-2> '''Sittings''' ||<-2> '''Committees''' ||<-2> '''Interpellations''' || || || ||<-2> '''Sittings''' ||<-2> '''Committees''' || || '''Years ''' || '''Period''' || '''docs''' || '''segments'''|| '''docs''' || '''segments''' || '''docs''' || '''segments''' || || || '''Period''' || '''docs''' || '''segments''' || '''docs''' || '''segments''' || || 1919–1922 || Legislative Sejm ||<)> 312 ||<)> 6 945 162 ||<:> – ||<:> – ||<:> – ||<:> – || || 1919–1922 ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1922–1927 || 1st term of office ||<)> 277 ||<)> 7 338 355 ||<:> – ||<:> – ||<:> – ||<:> – || || 1922–1927 || 1st term ||<)> 96 ||<)> 1 979 541 ||<:> – ||<:> – || || 1928–1930 || 2nd ||<)> 58 ||<)> 2 139 835 ||<:> – ||<:> – ||<:> – ||<:> – || || 1928–1930 || 2nd ||<)> 3 ||<)> 171 345 ||<:> – ||<:> – || || 1930–1935 || 3rd ||<)> 72 ||<)> 2 404 267 ||<:> – ||<:> – ||<:> – ||<:> – || || 1930–1935 || 3rd ||<)> 64 ||<)> 1 804 635 ||<:> – ||<:> – || || 1935–1938 || 4th ||<)> 73 ||<)> 2 133 181 ||<:> – ||<:> – ||<:> – ||<:> – || || 1935–1938 || 4th ||<)> 29 ||<)> 724 687 ||<:> – ||<:> – || || 1938–1939 || 5th ||<)> 23 ||<)> 610 455 ||<:> – ||<:> – ||<:> – ||<:> – || || 1938–1939 || 5th ||<)> 20 ||<)> 347 430 ||<:> – ||<:> – || || 1943–1947 || State National Council ||<)> 6 ||<)> 234 514 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1947–1952 || Legislative Sejm ||<)> 107 ||<)> 2 575 136 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1952–1956 || 1st term of office ||<)> 39 ||<)> 1 172 333 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1957–1961 || 2nd ||<)> 59 ||<)> 2 502 936 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1961–1965 || 3rd ||<)> 32 ||<)> 1 388 862 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1965–1969 || 4th ||<)> 23 ||<)> 1 163 336 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1969–1972 || 5th ||<)> 17 ||<)> 526 277 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1972–1976 || 6th ||<)> 32 ||<)> 1 176 712 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1976–1980 || 7th ||<)> 29 ||<)> 918 993 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1980–1985 || 8th ||<)> 70 ||<)> 3 377 139 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1985–1989 || 9th ||<)> 45 ||<)> 2 641 788 ||<:> – ||<:> – ||<:> – ||<:> – || || ||<:> – ||<:> – ||<:> – ||<:> – ||<:> – || || 1989–1991 || 10th ||<)> 77 ||<)> 6 674 111 ||<:> – ||<:> – ||<:> – ||<:> – || || 1989–1991 || 1st term ||<)> 60 ||<)> 3 170 293 ||<:> – ||<:> – || || 1991–1993 || 1st term of office ||<)> 142 ||<)> 7 739 147 ||<:> – ||<:> – ||<:> – ||<:> – || || 1991–1993 || 2nd ||<)> 48 ||<)> 1 459 440 ||<:> – ||<:> – || || 1993–1997 || 2nd ||<)> 317 ||<)> 22 134 682 ||<)> 3 858 ||<)> 41 756 476 ||<:> – ||<:> – || || 1993–1997 || 3rd ||<)> 125 ||<)> 5 051 677 ||<:> – ||<:> – || || 1997–2001 || 3rd ||<)> 320 ||<)> 24 138 142 ||<)> 4 690 ||<)> 42 510 604 ||<)> 23 507 ||<)> 12 101 453 || || 1997–2001 || 4th ||<)> 187 ||<)> 8 255 897 ||<:> – ||<:> – || || 2001–2005 || 4th ||<)> 337 ||<)> 28 743 846 ||<)> 4 942 ||<)> 49 302 521 ||<)> 30 986 ||<)> 17 519 177 || || 2001–2005 || 5th ||<)> 175 ||<)> 6 485 347 ||<:> – ||<:> – || || 2005–2007 || 5th ||<)> 148 ||<)> 11 737 186 ||<)> 2 359 ||<)> 18 970 036 ||<)> 26 689 ||<)> 14 777 377 || || 2005–2007 || 6th ||<)> 74 ||<)> 3 571 293 ||<:> – ||<:> – || || 2007–2011 || 6th ||<)> 298 ||<)> 22 415 708 ||<)> 5 565 ||<)> 51 752 446 ||<)> 59 353 ||<)> 36 412 001 || || 2007–2011 || 7th ||<)> 167 ||<)> 8 819 116 ||<:> – ||<:> – || || 2011–2015 || 7th ||<)> 292 ||<)> 22 488 262 ||<)> 5 126 ||<)> 44 645 569 ||<)> 85 599 ||<)> 61 565 989 || || 2011–2015 || 8th ||<)> 159 ||<)> 7 100 841 ||<:> – ||<:> – || || 2015–2019 || 8th ||<)> 239 ||<)> 19 431 789 ||<)> 4 828 ||<)> 46 735 363 ||<)> 79 194 ||<)> 56 720 590 || || 2015–2019 || 9th ||<)> 204 ||<)> 10 240 444 ||<)> 2 156 ||<)> 16 012 609 || || 2019–2023 || 9th ||<)> 198 ||<)> 15 074 054 ||<)> 3 768 ||<)> 32 117 982 ||<)> – ||<:> – || || 2019–2023 || 10th ||<)> 148 ||<)> 7 992 356 ||<)> 1 859 ||<)> 16 777 361 || || 2023– || 10th ||<)> 98 ||<)> 7 994 439 ||<)> 1 876 ||<)> 13 209 520 ||<)> – ||<:> – || || 2023– || 11th ||<)> 53 ||<)> 1 609 769 ||<)> 506 ||<)> 4 238 073 || |
Line 7: | Line 43: |
Corpus files are made available in TEI P5 format compatible with the annotation used by the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The resource contains automatically created annotation of: * utterance-level segmentation, * tokenization, * lemmatization, * disambiguated morphosyntactic description, * syntactic words, * syntactic groups, * named entities. |
Corpus files are made available in XML TEI P5 format compatible with the annotation used by the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The resource contains automatically created annotation of: * utterance-level segmentation, tokenization and lemmatization produced with [[http://morfeusz.sgjp.pl/|Morfeusz2]] * disambiguated morphosyntactic description produced with [[http://zil.ipipan.waw.pl/Concraft|Concraft2]] * named entities produced with [[http://nlp.pwr.wroc.pl/narzedzia-i-zasoby/narzedzia/liner2|Liner2]] * dependency structures produced with [[http://zil.ipipan.waw.pl/PDB/PDBparser|COMBO parser]]. |
Line 16: | Line 49: |
== Corpus data == | |
Line 18: | Line 50: |
* [[http://sejm.nlp.ipipan.waw.pl/static/PSC-all.tar|Sejm: Interpellations and questions terms 3-8, sittings terms 1-8]], 37.2 GB * [[http://sejm.nlp.ipipan.waw.pl/static/PSC-Senat.tar|Senate: Sittings terms 2-9]], 5.2 GB |
== Download == |
Line 21: | Line 52: |
Please use the links for individual terms of office. The XML version contains both TEI-encoded source data, metadata and linguistic annotation in CCL format; PDFs are original source files (often non-searchable). || 1919–1922 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1919-1922-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1919-1922-pdf.zip|PDF]] || || 1922–1927 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1922-1927-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1922-1927-pdf.zip|PDF]] || || 1928–1930 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1928-1930-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1928-1930-pdf.zip|PDF]] || || 1930–1935 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1930-1935-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1930-1935-pdf.zip|PDF]] || || 1935–1938 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1935-1938-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1935-1938-pdf.zip|PDF]] || || 1938–1939 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1938-1939-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1938-1939-pdf.zip|PDF]] || || 1943–1947 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1943-1947-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1943-1947-pdf.zip|PDF]] || || 1947–1952 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1947-1952-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1947-1952-pdf.zip|PDF]] || || 1952–1956 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1952-1956-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1952-1956-pdf.zip|PDF]] || || 1957–1961 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1957-1961-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1957-1961-pdf.zip|PDF]] || || 1961–1965 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1961-1965-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1961-1965-pdf.zip|PDF]] || || 1965–1969 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1965-1969-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1965-1969-pdf.zip|PDF]] || || 1969–1972 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1969-1972-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1969-1972-pdf.zip|PDF]] || || 1972–1976 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1972-1976-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1972-1976-pdf.zip|PDF]] || || 1976–1980 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1976-1980-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1976-1980-pdf.zip|PDF]] || || 1980–1985 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1980-1985-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1980-1985-pdf.zip|PDF]] || || 1985–1989 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1985-1989-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1985-1989-pdf.zip|PDF]] || || 1989–1991 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1989-1991-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1989-1991-pdf.zip|PDF]] || || 1991–1993 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1991-1993-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1991-1993-pdf.zip|PDF]] || || 1993–1997 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1993-1997-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1993-1997-pdf.zip|PDF]] || || 1997–2001 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/1997-2001-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/1997-2001-pdf.zip|PDF]] || || 2001–2005 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2001-2005-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2001-2005-pdf.zip|PDF]] || || 2005–2007 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2005-2007-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2005-2007-pdf.zip|PDF]] || || 2007–2011 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2007-2011-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2007-2011-pdf.zip|PDF]] || || 2011–2015 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2011-2015-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2011-2015-pdf.zip|PDF]] || || 2015–2019 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2015-2019-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2015-2019-pdf.zip|PDF]] || || 2019–2023 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2019-2023-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2019-2023-pdf.zip|PDF]] || || 2023–2027 || [[https://kdp.ipipan.waw.pl/static/ppcdump-tei/2023-2027-tei.zip|XML]] || [[https://kdp.ipipan.waw.pl/static/ppcdump-pdf/2023-2027-pdf.zip|PDF]] || You can also take a look at: * [[attachment:ppc-sample.zip|a small sample with data from different periods]] (39 MB) * [[http://git.nlp.ipipan.waw.pl/PPC/ppc|PPC data on GitLab]] == Searching the corpus == * [[https://kdp.ipipan.waw.pl/|Korpus Dyskursu Parlamentarnego]] search engine (in Polish) == Licence == The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence. |
|
Line 24: | Line 97: |
So far please cite the Polish Sejm Corpus paper: | <<BibMate(key, "ogr:etal:22:parlaclarin:kdp", "ogr:nit:20:parlaclarin", "ogr:18:parlaclarin", "ogro:12:lrec", omitYears=true)>> |
Line 26: | Line 99: |
<<BibMate(key,"ogro:12:lrec",omitYears=true)>> | == See also == |
Line 28: | Line 101: |
== Searching the corpus == | * [[https://www.clarin.eu/sites/default/files/2-ogrodniczuk.pdf|The slides]] from [[https://www.clarin.eu/event/2017/clarin-plus-workshop-working-parliamentary-records|CLARIN-PLUS Workshop "Working with Parliamentary Records"]], Sofia, 27–29 March 2017. * [[https://www.youtube.com/watch?v=KEG_6WsTT5I|Webinar on PPC]] from CLARIN-PL workshop session (November 2020). * [[https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora|ParlaMint]] project reusing a portion of data from the Polish Parliamentary Corpus in a multilingual setting. * [[https://clip.ipipan.waw.pl/PRTC|Polish Round Table Corpus]], a parliamentary-like dataset of negotiations between the authorities in communist People's Republic of Poland and a section of the opposition, encoded in the PPC format. |
Line 30: | Line 106: |
Online search of the full corpus will be available soon. Search tool for a 200M sample of the Sejm data is available at http://sejm.nlp.ipipan.waw.pl/. You can also use the Poliqarp image of this smaller corpus: http://sejm.nlp.ipipan.waw.pl/static/PSC_poliqarp.tar.gz (826 MB). | == Contact == [[http://zil.ipipan.waw.pl/MaciejOgrodniczuk|Maciej Ogrodniczuk]], Institute of Computer Science, Polish Academy of Sciences |
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of the Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and later extended using the support of CLARIN-PL, MARCELL and ParlaMint projects.
Corpus data
The current size of the corpus (as of 22 May 2025) amounts to over 870M segments with detailed distribution over houses, periods, and document types presented below. Apart from the stenographic records of plenary sittings (271M segments) and committee sittings (330M segments), the corpus contains 199M segments of interpellations and questions.
|
Sejm |
|
Senate |
|||||||||||
|
|
Sittings |
Committees |
Interpellations |
|
|
|
Sittings |
Committees |
|||||
Years |
Period |
docs |
segments |
docs |
segments |
docs |
segments |
|
|
Period |
docs |
segments |
docs |
segments |
1919–1922 |
Legislative Sejm |
312 |
6 945 162 |
– |
– |
– |
– |
|
1919–1922 |
– |
– |
– |
– |
– |
1922–1927 |
1st term of office |
277 |
7 338 355 |
– |
– |
– |
– |
|
1922–1927 |
1st term |
96 |
1 979 541 |
– |
– |
1928–1930 |
2nd |
58 |
2 139 835 |
– |
– |
– |
– |
|
1928–1930 |
2nd |
3 |
171 345 |
– |
– |
1930–1935 |
3rd |
72 |
2 404 267 |
– |
– |
– |
– |
|
1930–1935 |
3rd |
64 |
1 804 635 |
– |
– |
1935–1938 |
4th |
73 |
2 133 181 |
– |
– |
– |
– |
|
1935–1938 |
4th |
29 |
724 687 |
– |
– |
1938–1939 |
5th |
23 |
610 455 |
– |
– |
– |
– |
|
1938–1939 |
5th |
20 |
347 430 |
– |
– |
1943–1947 |
State National Council |
6 |
234 514 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1947–1952 |
Legislative Sejm |
107 |
2 575 136 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1952–1956 |
1st term of office |
39 |
1 172 333 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1957–1961 |
2nd |
59 |
2 502 936 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1961–1965 |
3rd |
32 |
1 388 862 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1965–1969 |
4th |
23 |
1 163 336 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1969–1972 |
5th |
17 |
526 277 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1972–1976 |
6th |
32 |
1 176 712 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1976–1980 |
7th |
29 |
918 993 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1980–1985 |
8th |
70 |
3 377 139 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1985–1989 |
9th |
45 |
2 641 788 |
– |
– |
– |
– |
|
|
– |
– |
– |
– |
– |
1989–1991 |
10th |
77 |
6 674 111 |
– |
– |
– |
– |
|
1989–1991 |
1st term |
60 |
3 170 293 |
– |
– |
1991–1993 |
1st term of office |
142 |
7 739 147 |
– |
– |
– |
– |
|
1991–1993 |
2nd |
48 |
1 459 440 |
– |
– |
1993–1997 |
2nd |
317 |
22 134 682 |
3 858 |
41 756 476 |
– |
– |
|
1993–1997 |
3rd |
125 |
5 051 677 |
– |
– |
1997–2001 |
3rd |
320 |
24 138 142 |
4 690 |
42 510 604 |
23 507 |
12 101 453 |
|
1997–2001 |
4th |
187 |
8 255 897 |
– |
– |
2001–2005 |
4th |
337 |
28 743 846 |
4 942 |
49 302 521 |
30 986 |
17 519 177 |
|
2001–2005 |
5th |
175 |
6 485 347 |
– |
– |
2005–2007 |
5th |
148 |
11 737 186 |
2 359 |
18 970 036 |
26 689 |
14 777 377 |
|
2005–2007 |
6th |
74 |
3 571 293 |
– |
– |
2007–2011 |
6th |
298 |
22 415 708 |
5 565 |
51 752 446 |
59 353 |
36 412 001 |
|
2007–2011 |
7th |
167 |
8 819 116 |
– |
– |
2011–2015 |
7th |
292 |
22 488 262 |
5 126 |
44 645 569 |
85 599 |
61 565 989 |
|
2011–2015 |
8th |
159 |
7 100 841 |
– |
– |
2015–2019 |
8th |
239 |
19 431 789 |
4 828 |
46 735 363 |
79 194 |
56 720 590 |
|
2015–2019 |
9th |
204 |
10 240 444 |
2 156 |
16 012 609 |
2019–2023 |
9th |
198 |
15 074 054 |
3 768 |
32 117 982 |
– |
– |
|
2019–2023 |
10th |
148 |
7 992 356 |
1 859 |
16 777 361 |
2023– |
10th |
98 |
7 994 439 |
1 876 |
13 209 520 |
– |
– |
|
2023– |
11th |
53 |
1 609 769 |
506 |
4 238 073 |
Corpus format
Corpus files are made available in XML TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
utterance-level segmentation, tokenization and lemmatization produced with Morfeusz2
disambiguated morphosyntactic description produced with Concraft2
named entities produced with Liner2
dependency structures produced with COMBO parser.
Download
Please use the links for individual terms of office. The XML version contains both TEI-encoded source data, metadata and linguistic annotation in CCL format; PDFs are original source files (often non-searchable).
1919–1922 |
||
1922–1927 |
||
1928–1930 |
||
1930–1935 |
||
1935–1938 |
||
1938–1939 |
||
1943–1947 |
||
1947–1952 |
||
1952–1956 |
||
1957–1961 |
||
1961–1965 |
||
1965–1969 |
||
1969–1972 |
||
1972–1976 |
||
1976–1980 |
||
1980–1985 |
||
1985–1989 |
||
1989–1991 |
||
1991–1993 |
||
1993–1997 |
||
1997–2001 |
||
2001–2005 |
||
2005–2007 |
||
2007–2011 |
||
2011–2015 |
||
2015–2019 |
||
2019–2023 |
||
2023–2027 |
You can also take a look at:
Searching the corpus
Korpus Dyskursu Parlamentarnego search engine (in Polish)
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
![]() |
![]() |
See also
The slides from CLARIN-PLUS Workshop "Working with Parliamentary Records", Sofia, 27–29 March 2017.
Webinar on PPC from CLARIN-PL workshop session (November 2020).
ParlaMint project reusing a portion of data from the Polish Parliamentary Corpus in a multilingual setting.
Polish Round Table Corpus, a parliamentary-like dataset of negotiations between the authorities in communist People's Republic of Poland and a section of the opposition, encoded in the PPC format.
Contact
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences