Size: 2294
Comment:
|
Size: 6454
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 16: | Line 16: |
The Polish Parliamentary Corpus contains: * Sejm sittings from 1919–present (including Legislative Sejm and State National Council) * Sejm committee sittings from 1993–present * Sejm interpellations and questions from 1997–present * Senate sittings from 1922–1939 and 1989–present * Senate committee sittings from 2015–present |
The current size of the corpus amounts to 749M segments with detailed distribution over houses, periods, and document types presented below. Apart from the stenographic records of plenary sittings (261M segments) and committee sittings (288M segments), the corpus contains 199M segments of interpellations and questions. || || ||<-2> '''Sittings''' ||<-2> '''Committees''' ||<-2> '''Interpellations''' || || '''Years ''' || '''Period''' || '''docs''' || '''segments'''|| '''docs''' || '''segments''' || '''docs''' || '''segments''' || || 1919–1922 || Legislative Sejm ||<)> 312 ||<)> 6 945 162 ||<:> – ||<:> – ||<:> – ||<:> – || || 1922–1927 || 1st term of office ||<)> 277 ||<)> 7 338 355 ||<:> – ||<:> – ||<:> – ||<:> – || || 1928–1930 || 2nd ||<)> 58 ||<)> 2 139 835 ||<:> – ||<:> – ||<:> – ||<:> – || || 1930–1935 || 3rd ||<)> 72 ||<)> 2 404 267 ||<:> – ||<:> – ||<:> – ||<:> – || || 1935–1938 || 4th ||<)> 73 ||<)> 2 133 181 ||<:> – ||<:> – ||<:> – ||<:> – || || 1938–1939 || 5th ||<)> 23 ||<)> 610 455 ||<:> – ||<:> – ||<:> – ||<:> – || || 1943–1947 || State National Council ||<)> 6 ||<)> 234 441 ||<:> – ||<:> – ||<:> – ||<:> – || || 1947–1952 || Legislative Sejm ||<)> 107 ||<)> 2 575 136 ||<:> – ||<:> – ||<:> – ||<:> – || || 1952–1956 || 1st term of office ||<)> 39 ||<)> 1 172 333 ||<:> – ||<:> – ||<:> – ||<:> – || || 1957–1961 || 2nd ||<)> 59 ||<)> 2 502 936 ||<:> – ||<:> – ||<:> – ||<:> – || || 1961–1965 || 3rd ||<)> 32 ||<)> 1 388 862 ||<:> – ||<:> – ||<:> – ||<:> – || || 1965–1969 || 4th ||<)> 23 ||<)> 1 163 336 ||<:> – ||<:> – ||<:> – ||<:> – || || 1969–1972 || 5th ||<)> 17 ||<)> 526 277 ||<:> – ||<:> – ||<:> – ||<:> – || || 1972–1976 || 6th ||<)> 32 ||<)> 1 176 712 ||<:> – ||<:> – ||<:> – ||<:> – || || 1976–1980 || 7th ||<)> 29 ||<)> 918 993 ||<:> – ||<:> – ||<:> – ||<:> – || || 1980–1985 || 8th ||<)> 70 ||<)> 3 377 139 ||<:> – ||<:> – ||<:> – ||<:> – || || 1985–1989 || 9th ||<)> 45 ||<)> 2 641 788 ||<:> – ||<:> – ||<:> – ||<:> – || || 1989–1991 || 10th ||<)> 77 ||<)> 6 674 111 ||<:> – ||<:> – ||<:> – ||<:> – || || 1991–1993 || 1st term of office ||<)> 142 ||<)> 7 739 147 ||<:> – ||<:> – ||<:> – ||<:> – || || 1993–1997 || 2nd ||<)> 317 ||<)> 22 134 682 ||<)> 3 858 ||<)> 41 756 476 ||<)> – ||<)> – || || 1997–2001 || 3rd ||<)> 320 ||<)> 24 138 142 ||<)> 4 691 ||<)> 42 510 604 ||<)> 23 507 ||<)> 12 101 453 || || 2001–2005 || 4th ||<)> 337 ||<)> 28 743 846 ||<)> 4 945 ||<)> 49 302 521 ||<)> 30 986 ||<)> 17 519 177 || || 2005–2007 || 5th ||<)> 148 ||<)> 11 737 186 ||<)> 2 359 ||<)> 18 970 036 ||<)> 26 689 ||<)> 14 777 377 || || 2007–2011 || 6th ||<)> 298 ||<)> 22 415 708 ||<)> 5 565 ||<)> 44 363 063 ||<)> 59 353 ||<)> 36 412 001 || || 2011–2015 || 7th ||<)> 292 ||<)> 20 765 505 ||<)> 5 126 ||<)> 38 541 083 ||<)> 85 679 ||<)> 61 565 989 || || 2015–2019 || 8th ||<)> 239 ||<)> 19 131 000 ||<)> 4 561 ||<)> 36 708 873 ||<)> 79 194 ||<)> 56 720 590 || |
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus format
Corpus files are made available in TEI P5 format compatible with the annotation used by the National Corpus of Polish. The resource contains automatically created annotation of:
- utterance-level segmentation,
- tokenization,
- lemmatization,
- disambiguated morphosyntactic description,
- named entities.
Corpus data
The current size of the corpus amounts to 749M segments with detailed distribution over houses, periods, and document types presented below. Apart from the stenographic records of plenary sittings (261M segments) and committee sittings (288M segments), the corpus contains 199M segments of interpellations and questions.
|
|
Sittings |
Committees |
Interpellations |
|||
Years |
Period |
docs |
segments |
docs |
segments |
docs |
segments |
1919–1922 |
Legislative Sejm |
312 |
6 945 162 |
– |
– |
– |
– |
1922–1927 |
1st term of office |
277 |
7 338 355 |
– |
– |
– |
– |
1928–1930 |
2nd |
58 |
2 139 835 |
– |
– |
– |
– |
1930–1935 |
3rd |
72 |
2 404 267 |
– |
– |
– |
– |
1935–1938 |
4th |
73 |
2 133 181 |
– |
– |
– |
– |
1938–1939 |
5th |
23 |
610 455 |
– |
– |
– |
– |
1943–1947 |
State National Council |
6 |
234 441 |
– |
– |
– |
– |
1947–1952 |
Legislative Sejm |
107 |
2 575 136 |
– |
– |
– |
– |
1952–1956 |
1st term of office |
39 |
1 172 333 |
– |
– |
– |
– |
1957–1961 |
2nd |
59 |
2 502 936 |
– |
– |
– |
– |
1961–1965 |
3rd |
32 |
1 388 862 |
– |
– |
– |
– |
1965–1969 |
4th |
23 |
1 163 336 |
– |
– |
– |
– |
1969–1972 |
5th |
17 |
526 277 |
– |
– |
– |
– |
1972–1976 |
6th |
32 |
1 176 712 |
– |
– |
– |
– |
1976–1980 |
7th |
29 |
918 993 |
– |
– |
– |
– |
1980–1985 |
8th |
70 |
3 377 139 |
– |
– |
– |
– |
1985–1989 |
9th |
45 |
2 641 788 |
– |
– |
– |
– |
1989–1991 |
10th |
77 |
6 674 111 |
– |
– |
– |
– |
1991–1993 |
1st term of office |
142 |
7 739 147 |
– |
– |
– |
– |
1993–1997 |
2nd |
317 |
22 134 682 |
3 858 |
41 756 476 |
– |
– |
1997–2001 |
3rd |
320 |
24 138 142 |
4 691 |
42 510 604 |
23 507 |
12 101 453 |
2001–2005 |
4th |
337 |
28 743 846 |
4 945 |
49 302 521 |
30 986 |
17 519 177 |
2005–2007 |
5th |
148 |
11 737 186 |
2 359 |
18 970 036 |
26 689 |
14 777 377 |
2007–2011 |
6th |
298 |
22 415 708 |
5 565 |
44 363 063 |
59 353 |
36 412 001 |
2011–2015 |
7th |
292 |
20 765 505 |
5 126 |
38 541 083 |
85 679 |
61 565 989 |
2015–2019 |
8th |
239 |
19 131 000 |
4 561 |
36 708 873 |
79 194 |
56 720 590 |
Download
At present you can download the unannotated TEI version (1.7 GB). Linguistically annotated version will be added soon.
Searching the corpus
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27–29 March 2017.
Contact
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences