Locked History Actions

Diff for "PL196x"

Differences between revisions 17 and 18
Revision 17 as of 2012-03-17 22:57:57
Size: 10963
Comment:
Revision 18 as of 2012-03-17 23:01:52
Size: 11133
Comment:
Deletions are marked like this. Additions are marked like this.
Line 23: Line 23:
 * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''[[http://rcin.org.pl/dlibra/docmetadata?id=3279|Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom I. Teksty popularnonaukowe – część I]] i II''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume I. Scientific texts''). Warszawa, 1974. Warsaw University.
 * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom II. Drobne wiadomości prasowe''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume II. News''). Warszawa, 1974. Warsaw University.
 * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''[[http://rcin.org.pl/dlibra/docmetadata?id=3279|Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom I. Teksty popularnonaukowe – część I]] i [[http://rcin.org.pl/dlibra/docmetadata?id=3282|II]]''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume I. Scientific texts''). Warszawa, 1974. Warsaw University.
 * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''[[http://rcin.org.pl/dlibra/docmetadata?id=3280|Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom II. Drobne wiadomości prasowe – część I]]''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume II. News''). Warszawa, 1974. Warsaw University.
Line 26: Line 26:
 * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''[[http://rcin.org.pl/dlibra/docmetadata?id=3286|Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom IV. Proza artystyczna – część I]] i [[http://rcin.org.pl/dlibra/docmetadata?id=3288|II]]''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume IV. Fiction''). Warszawa, 1976. Warsaw University.  * Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. ''[[http://rcin.org.pl/dlibra/docmetadata?id=3286|Słownictwo współczesnego języka polskiego. Listy frekwencyjne. Tom IV. Proza artystyczna – część I]], [[http://rcin.org.pl/dlibra/docmetadata?id=3288|II]] i [[http://rcin.org.pl/dlibra/docmetadata?id=3292|III]]''. (In Polish, EN: ''Vocabulary of contemporary Polish. Frequency lists. Volume IV. Fiction''). Warszawa, 1976. Warsaw University.

Polish language of the 1960s

This page is dedicated to the corpus of frequency dictionary of contemporary Polish. The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures.

Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.

In 2001 corpus authors agreed to publish the data in the Internet under GNU licence. This site presents corpus data in base and extended (enhanced) version as well as additional materials and corpus documentation.

Corpus documentation

Selected bibliography

Corpus licence

Corpus data

Cluster

samples

"Raw"

Enhanced

TEI P4 XML

without codes

with codes

version

Style A: Scientific texts

1 MB

1,5 MB

1,1 MB

4,0 MB

10 MB

Style B: News

1 MB

1,5 MB

1,2 MB

3,9 MB

9 MB

Style C: Essays

1 MB

1,5 MB

1,2 MB

4,0 MB

10 MB

Style D: Fiction

1 MB

1,5 MB

1,1 MB

4,1 MB

11 MB

Style E: Plays

1 MB

1,5 MB

1,1 MB

4,4 MB

12 MB

Auxilliary files for the TEI P4-encoded XML version:

ISO image of the CD-ROM with most of the materials.

Concordances