Locked History Actions

Diff for "LRT"

Differences between revisions 384 and 385
Revision 384 as of 2018-03-27 14:17:15
Size: 28452
Revision 385 as of 2018-05-10 06:56:44
Size: 28755
Deletions are marked like this. Additions are marked like this.
Line 33: Line 33:
 * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński),  * [[http://scriptores.pl/efontes/|eFontes Mediae et Infimae Latinitatis Polonorum]] (1000–1550, IJP PAN)
 * [[https://www.ijp-pan.krakow.pl/publikacje-elektroniczne/korpus-tekstow-staropolskich|Corpus of old Polish (up to 1500)]] (IJP PAN)
 * [[http://stnt.ijp.pan.pl/|15. century New Testament translations]] (IJP PAN)
 * [[https://szukajwslownikach.uw.edu.pl/IMPACT_GT_1/|IMPACT project corpus]] (1570–1756, KLF UW)
 * [[http://www.spxvi.edu.pl/korpus/|Corpus of 16. century Polish]] (IBL PAN)
 * [[http://fedora.clarin-d.uni-saarland.de/poldilemma/|PolDiLemma]], the Middle Polish Diachrone Lemmatised Corpus (16–18th c., R. Meyer)
 * [[http://korba.edu.pl|KORBA]], electronic corpus of 17th and 18th century Polish texts (1601–1772, IJP PAN)
 * [[http://www.f19.uw.edu.pl/|Corpus of the 19. century Polish]], (1830–1918, IJP UW)
Line 35: Line 42:
 * [[http://www.f19.uw.edu.pl/|Microcorpus of Polish: 1830-1918]], (M. Derwojedowa),
 * [[http://korba.edu.pl|KORBA]], electronic corpus of 17th and 18th century Polish texts (W. Gruszczyński),
 * [[http://fedora.clarin-d.uni-saarland.de/poldilemma/|PolDiLemma]], the Middle Polish Diachrone Lemmatised Corpus (R. Meyer),
 * [[http://www.spxvi.edu.pl/korpus/|Corpus of 16. century Polish]] (IBL PAN),
 * [[https://www.ijp-pan.krakow.pl/publikacje-elektroniczne/korpus-tekstow-staropolskich|Corpus of old Polish (up to 1500)]] (IJP PAN).
 * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora of contemporary Polish

Written corpora of historical Polish

Spoken corpora

Language models

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources


Parsers, grammars, treebanks

Sentiment analysis


Speech analysis and synthesis tools

Machine translation demonstrations



Named Entity Recognition

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

  • TIMEX, a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

Aggregating services


  • Mobile plWordNet, free mobile application for plWordNet browsing (J. Kocoń),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

  • Translatica SRX sentence segmentation rules for Polish (LGPL),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Fextor, a feature extraction framework,

  • LexCSD, a system for semi-automatic sense disambiguation,

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

  • WordnetLoom, an wordnet editor application,

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).

  • Stylo 2, stylometry demo,

  • TermoPL, multiword expression extraction tool.

  • DeepEvents, event extraction in Polish, based on deep neural networks.

  • Word similarity, calculation of the similarity of words based on word embeddings, on-line service,

  • Baltoslav, with several script converters (Romanizer, Cyrillizer, IPA Converter etc.)