Revision 4 as of 2011-04-11 12:06:53

Clear message
Locked History Actions

PL196x

Polish language of the XX century sixties

This page is dedicated to the corpus of frequency dictionary of contemporary Polish. The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures.

Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.

In 2001 corpus authors agreed to publish the data in the Internet under GNU licence. This site presents corpus data in base and extended (enhanced) version as well as additional materials and corpus documentation.

Corpus documentation

  • Bień, Janusz S.; Woliński, Marcin. Enhanced corpus of the frequency dictionary of contemporary Polish. (In Polish). December 17th, 2001.
  • Bień, Janusz S.; Woliński, Marcin. Numerical grammatical codes in enhanced corpus of the frequency dictionary. (In Polish). December 17th, 2001.
  • Głowińska, Katarzyna. Morphological taxonomy for the frequency dictionary. (In Polish)
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Szafran, Krzysztof; Woronczak, Jerzy. Polish language in the sixties (introduction to the printed edition of the frequency dictionary).
  • Ogrodniczuk, Maciej. Enhancing the corpus of the frequency dictionary with new grammatical codes. (In Polish)

Selected bibliography

  • Bień, Janusz S.; Woliński, Marcin. Enhanced corpus of the Frequency dictionary of contemporary Polish. (In Polish) [In:] Prace lingwistyczne dedykowane prof. Jadwidze Sambor. Jadwiga Linde-Usiekniewicz (ed.), pp. 6-10, Warszawa 2003, Faculty of Polish Philology, Warsaw University.
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. Vocabulary of contemporary Polish. Frequency lists. Volume I. Scientific texts. (In Polish) Warszawa, 1974. Warsaw University.
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. Vocabulary of contemporary Polish. Frequency lists. Volume II. News. (In Polish) Warszawa, 1974. Warsaw University.
  • Lewicki, Andrzej; Masłowski, Władysław; Sambor, Jadwiga; Woronczak, Jerzy. Vocabulary of contemporary Polish. Frequency lists. Volume III. Essays. (In Polish) Warszawa, 1975. Warsaw University.
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. Vocabulary of contemporary Polish. Frequency lists. Volume IV. Fiction. (In Polish) Warszawa, 1976. Warsaw University.
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Woronczak, Jerzy. Vocabulary of contemporary Polish. Frequency lists. Volume V. Plays. (In Polish) Warszawa, 1977. Warsaw University.
  • Kurcz, Ida; Lewicki, Andrzej; Sambor, Jadwiga; Szafran, Krzysztof; Woronczak, Jerzy. Frequency dictionary of contemporary Polish. (In Polish) Kraków, 1990. Institute of Polish Philology, Polish Academy of Sciences.
  • Nazarczuk, Marta. Initial preparation of the corpus of Frequency dictionary of contemporary Polish for CD-ROM distribution. (In Polish) Master thesis prepared under supervision of prof. Janusz S. Bień. Warsaw, 1997. Institute of Polish Philology, Warsaw University. 59 pages, CD-ROM.
  • Ogrodniczuk, Maciej. New edition of the Enhanced corpus of the Frequency dictionary.. (In Polish) [In:] Językoznawstwo w Polsce. Stan i perspektywy. Stanisław Gajda (ed.) Institute of Polish Philology, Polish Academy of Sciences - Linguistics Committee, Opole University. Opole 2003, pp. 181-190. ISBN 83-86881-36-4.
  • Ogrodniczuk, Maciej. Augmenting the morphological description in the corpus of Frequency dictionary of contemporary Polish. (In Polish) [In:] Prace lingwistyczne dedykowane prof. Jadwidze Sambor. Jadwiga Linde-Usiekniewicz (ed.), pp. 164-168, Warszawa 2003, Faculty of Polish Philology, Warsaw University.
  • Ogrodniczuk, Maciej. Encoding of Polish linguistic data with SGML and TEI. (In Polish) Master thesis prepared under supervision of prof. Janusz S. Bień. Warsaw, 2000. Institute of Informatics, Warsaw University. 83 pages, CD-ROM.
  • Saloni, Zygmunt. Frequency dictionary of contemporary Polish. (In Polish) ComputerWorld, November 4th 1991, pp. 16-17.

Corpus licence

  • GNU Free Documentation Licence for corpus documentation.
  • GNU General Public Licence for corpus data.

Corpus data

|| || Without codes || With codes || "Raw" version || Enhanced version || TEI P4 XML version || || Style A: Scientific texts || || || || 1,1 MB || 4,0 MB || 10 MB ||

Style B: News

1,2 MB

3,9 MB

9 MB

Style C: Essays

1,2 MB

4,0 MB

10 MB

Style D: Fiction

1,1 MB

4,1 MB

11 MB

Style E: Plays

1,1 MB

4,4 MB

12 MB

Auxilliary files for the TEI P4-encoded XML version:

  • Master file with TEI header for the corpus
  • Feature library (assembles library of feature elements)
  • Feature structure library (assembles library of feature structure elements)
  • Writing system declaration
  • Feature structures representing morphological descriptions for Polish

Concordances

  • Concordances with location [17.9 MB]
  • Concordances without location [24 MB]