Locked History Actions

Diff for "LRT"

Differences between revisions 1 and 418 (spanning 417 versions)
Revision 1 as of 2011-02-28 17:54:09
Size: 2751
Comment:
Revision 418 as of 2020-04-01 17:06:48
Size: 31390
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Strona zawiera informacje dotyczące zasobów i narzędzi związanych z przetwarzaniem języka naturalnego, które można uruchomić lokalnie lub poprzez internet. Strony WWW, opisujące jedynie działanie takich zasobów czy narzędzi lub podające jedynie przykłady ich działania, ale nie mające pełnej funkcjonalności ostatecznego produktu, nie są tutaj podane. Z kolei dowiązania do stron WWW przedstawiających istotne i pożyteczne próbki zasobów zostały tu umieszczone. Ponieważ kryteria wyboru prezentowanych poniżej zasobów nie są jednoznaczne, wszelkie uwagi mile widziane.

Korpusy tekstów języka polskiego:
 * Wersja [[http://www.mimuw.edu.pl/polszczyzna/pl196x/|Polskiego słownika frekwencyjnego]]
 * [[http://korpus.pwn.pl/|Korpus PWN]]
 * [[http://www.korpus.pl/|Korpus IPI PAN]]
 * [[http://www.staff.amu.edu.pl/~przemka/picle.html|Korpus PICLE]] (polska część korpusu International Corpus of Learner English; P. Kaszubski)

Analizatory morfologiczne / Słowniki fleksyjne:
 * Analizator morfologiczny [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]
 * Analizator morfologiczny SAM-95 (K. Szafran)
 * Fleksyjna baza danych (W. Lubaszewski et al.; demo)
 * [[http://gram.neurosoft.pl/|Neurosoft Gram]] (demo)
 * Narzędzia Xerox (tokenizator, analizator morfologiczny, dezambiguator; demo)
 * [[http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa_polski.html|Narzędzia leksykalne wykorzystujące automaty skończone]] (J. Daciuk)
 * [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|lematyzator]] D. Weissa
 * [[http://getopt.org/stempel/|Stempel]], jeszcze jeden lematyzator (A. Białecki)

Analizatory składniowe / Gramatyki elektroniczne:
 * Prototyp gramatyki HPSG języka polskiego (IPI PAN)

Systemy tłumaczenia maszynowego (demo dostępne przez internet):
 * [[http://www.translatica.pl/|Translatica]] (dwukierunkowy angielsko-polski system tłumaczący)
 * [[http://www.tranexp.com/|InterTran]] (różne pary języków)
 * [[http://www.poltran.com/|LingvoBit]] (dwukierunkowy angielsko-polski system tłumaczący)
 * [[http://www.systran.co.uk/|Systran]] (z angielskiego na polski i z polskiego na francuski, oprócz innych par języków)
 * Thetos (system tłumaczący polski tekst na polski język migowy)

Inne:
 * [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet]], polski WordNet (M. Piasecki)
 * [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], program do znajdowania kolokacji (A. Buczyński)
 * [[http://nlp.ipipan.waw.pl/CRIT2/|Zbiór zdań testowych języka polskiego IPI PAN]]
 * [[http://www.lingwistyka.uni.wroc.pl/bql/|Bibliografia polskiej lingwistyki kwantytatywnej]] (A. Pawłowski)
= Language Tools and Resources for Polish =

This page contains a list of ''publicly available'' language tools and resources.

/* * [[attachment:NKJP-PodkorpusMilionowy-1.0.tgz]], the manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3, */
/* * [[attachment:NKJP-PodkorpusMilionowy-1.1.tgz]], the manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3, */
/* * [[attachment:NKJP-PodkorpusMilionowy-1.0-poliqarp-bin.tgz]], the binary version of corpus to be used with standalone Poliqarp tool. */

== Written corpora of contemporary Polish ==
 * [[NationalCorpusOfPolish|National Corpus of Polish]] (NKJP),
 * [[http://korpus.pwn.pl/|PWN Corpus]],
 * [[http://nfjp.pl/|National Photocorpus of Polish]] (NFJP),
 * [[http://poliqarp.wbl.klf.uw.edu.pl/|Dictionaries as Corpora]],
 * [[PPC|Polish Parliamentary Corpus]],
 * [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]],
 * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE),
 * [[http://dl.psnc.pl/activities/projekty/impact/results/| IMPACT ground-truth data]] for selected Polish historical documents from PIONIER Digital Libraries Federation,
  * Now available also as corpora in the Poliqarp for !DjVu [[http://poliqarp.wbl.klf.uw.edu.pl|search engine]],
 * [[http://nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/kpwr|KPWr]], Polish Corpus of Wrocław University of Technology, collection of documents available on Creative Common license annotated with syntactic chunks, proper names, semantic relations, anaphora and word senses,
 * [[http://www.pcsn.uni.wroc.pl/|Polish Corpus of Suicide Notes]],
 * [[PolishWikipediaCorpus|Polish Wikipedia Corpus]],
 * [[http://zil.ipipan.waw.pl/gpwEcono|gpwEcono]], a corpus of stock market reports, with manual word sense annotation,
 * [[http://zil.ipipan.waw.pl/plWikiEcono|plWikiEcono]], a corpus of Polish Wikipedia articles from the domain of economy,
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a corpus of Polish coreference relations, created as part of the [[http://core.ipipan.waw.pl/About|CORE project]],
 * [[http://argumentacja.pdg.pl/argdbpl/|ArgDB-pl]], a Polish corpus of arguments in natural contexts,
 * [[http://www.staff.amu.edu.pl/~romang/wiki_errors_pl.php|PIEWiC]], a Polish corpus of errors automatically extracted from Wikipedia revisions,
 * [[http://clip.ipipan.waw.pl/PolEval|PolEval]], corpora and other text resources created for [[http://www.poleval.pl|PolEval]] shared tasks,
 * [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]], annotated manually for verbal multiword expressions in 20 languages including Polish, used in the [[http://multiword.sourceforge.net/sharedtask2018/|PARSEME shared task 1.1]]; the Polish subcorpus is aligned with automatic dependency annotations in the [[http://universaldependencies.org/guidelines.html|UD]] format (A. Savary),
 * [[http://clip.ipipan.waw.pl/MweLitRead|MweLitRead]] - a corpus of literal readings of Polish verbal MWEs stemming from the [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]] (A. Savary, S. Cordeiro),
 * [[http://pelcra.pl/plec/downloads|PELCRA Learner English Corpus]] (PLEC),
 * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine,
 * [[http://synamet.polon.uw.edu.pl/|Microcorpus of Synesthetic Metaphors]].

== Written corpora of historical Polish ==
 * [[http://scriptores.pl/efontes/|eFontes Mediae et Infimae Latinitatis Polonorum]] (1000–1550, IJP PAN)
 * [[https://www.ijp-pan.krakow.pl/publikacje-elektroniczne/korpus-tekstow-staropolskich|Corpus of old Polish (up to 1500)]] (IJP PAN)
 * [[http://stnt.ijp.pan.pl/|15. century New Testament translations]] (IJP PAN)
 * [[https://szukajwslownikach.uw.edu.pl/IMPACT_GT_1/|IMPACT project corpus]] (1570–1756, KLF UW)
 * [[http://spxvi.edu.pl/korpus/|Corpus of 16. century Polish]] (IBL PAN)
 * [[http://fedora.clarin-d.uni-saarland.de/poldilemma/|PolDiLemma]], the Middle Polish Diachrone Lemmatised Corpus (16–18th c., R. Meyer)
 * [[http://rhssl1.uni-regensburg.de/SlavKo/korpus/poldi|PolDi]], a Polish Diachronic Online Corpus (R. Meyer)
 * [[http://korba.edu.pl|KORBA]], electronic corpus of 17th and 18th century Polish texts (1601–1772, IJP PAN)
 * [[http://www.f19.uw.edu.pl/|Corpus of the 19. century Polish]], (1830–1918, IJP UW)
 * [[http://korpus19.nlp.ipipan.waw.pl/|Manually annotated and transcribed corpus of the 19th century Polish]], (1830–1918, IPI PAN)
 * [[http://chronopress.clarin-pl.eu/|ChronoPress]], corpus of press texts from 1945–1954 (A. Pawłowski),
 * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński).

== Corpus-related tools and resources ==
 * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine (please see also [[http://nlp.ipipan.waw.pl/Poliqarp/|the beta version of Poliqarp 1.1]] and [[http://clip.ipipan.waw.pl/Poliqarp|1.3]] with statistical extensions and [[http://liszt.ipipan.waw.pl/|several corpora indexed with Poliqarp 2]]),
 * [[http://zil.ipipan.waw.pl/Anotatornia|Anotatornia]], a system for multi-level manual annotation of corpora,
 * [[http://zil.ipipan.waw.pl/Anotatornia2|Anotatornia2]], new version of Anotatornia geared towards annotation of historical corpora,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/inforex|Inforex]], a web-based system designed for managing and annotating text corpora on the semantic level,
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer,
 * [[http://korpusy.net/|korpusy.net]], a corpus research-related website (B. Gałkowski),
 * [[http://korpusomat.nlp.ipipan.waw.pl/|Korpusomat]], a tool for creation of searchable own corpora (Ł. Kobyliński, W. Kieraś).

== Spoken corpora ==
 * The PELCRA conversational corpus of Polish: approx. 2.2 million words of casual conversational spoken Polish collected and processed in the years 2001-2015 in a number of research projects, including PELCRA, NKJP, CESAR and CLARIN-PL. Available under CC-BY-NC. All transcriptions can be accessed through the [[http://spokes.clarin-pl.eu|Spokes web interface]] and programmatically through [[http://clarin.pelcra.pl/apidocs/spokes| a REST API]].
 * [[http://clip.ipipan.waw.pl/LUNA|The annotated corpus of spoken dialogues]] (LUNA project, corpus data available at the end of the page),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=en:resources:korpusmowy|AGH speech corpus]], around 9 hours, word-annotated Polish speech corpus (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=en:resources:korpusav|Audiovideo corpus]] of Polish speech (AGH DSP),
 * [[http://nkjp.uni.lodz.pl/spoken.jsp|NKJP search engine for spoken-conversational data]],
 * [[http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1164|Acoustic database for Polish unit selection speech synthesis]] (ELRA resources),
 * [[http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1168|Acoustic database for Polish concatenative speech synthesis]] (ELRA resources),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:korpusemo|Corpus of emotions in speech]] (AGH DSP).

== Language models ==
 * [[http://zil.ipipan.waw.pl/NKJPNGrams|N-grams from the balanced subcorpus of the National Corpus of Polish]],
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:ngram|N-gram model of Polish]] (AGH DSP),
 * [[http://mozart.ipipan.waw.pl/~axw/models/|Distributional semantic models]] trained on orthographical, lemmatized word forms (A. Wawer),
 * Even more [[http://dsmodels.nlp.ipipan.waw.pl|distributional semantic models]] based on NKJP (A. Mykowiecka, M. Marciniak, P. Rychlik),
 * [[https://github.com/deepmipt/Slavic-BERT-NER|Slavic BERT NER]] (see also [[http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert|Deep Pavlov website]],
 * [[https://github.com/sdadas/polish-nlp-resources|RoBERTa]] and links many other useful resources (S. Dadas),
 * [[https://github.com/kldarek/polbert|Polbert]] (D. Kłeczek).


== Parallel corpora and translation memories ==
 * [[http://parasolcorpus.org/|ParaSol]], a parallel corpus of Slavic and other languages,
 * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus,
 * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
 * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
 * [[http://www.korpus.cz/intercorp/?req=page:info|InterCorp]], a multilingual parallel corpus,
 * [[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]],
 * [[http://korpus.hiztegia.org/|LAGUN corpus]],
 * [[http://pelcra.pl/new/cesar|PELCRA Parallel corpora]], a collection of downloadable parallel corpora available under the CC-BY and CC-BY-NC licensed developed by the PELCRA team
 * [[http://paralela.clarin-pl.eu|Paralela Polish-English corpus]]
 * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]],
 * [[http://psi.amu.edu.pl/en/index.php?title=Parallel_Corpora|PSI collection of parallel corpora]], a growing collection of parallel corpora pairing Polish with other european languages,
 * [[http://www.pol-ros.polon.uw.edu.pl/|Polish-Russian Parallel Corpus]],
 * [[http://mymemory.translated.net/|MyMemory]], freely available multilingual TM,
 * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association,
 * [[http://glosbe.com/|Glosbe]], an open source TM.

== Machine-readable dictionaries ==
 * [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet, Słowosieć]] (M. Piasecki),
 * [[http://www.ltc.amu.edu.pl/polnet/|POLNET, another Polish Wordnet]] (Z. Vetulani),
 * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], słownik synonimów – a crowdsourced Polish thesaurus (M. Miłkowski),
 * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display,
 * [[Nowy_slownik_angielsko-polski|Nowy słownik angielsko-polski]] (T. Piotrowski, Z. Saloni),
 * [[http://zil.ipipan.waw.pl/OpenCYCPL|Polish OpenCYC]] (A. Pohl),
 * [[http://www.slowniki.org.pl/pol.html|Polish machine-generated dictionaries]], available on Creative Commons (J. Kazojć),
 * [[http://futrega.org/etc/nazwiska.zip|List of all Polish surnames]], licence unknown, see [[http://futrega.org/etc/nazwiska.html|further information on this resource]],
 * [[http://clip.ipipan.waw.pl/Gazetteer|Gazetteer for Polish Named Entities]] (A. Savary, M. Lenart, J. Piskorski),
 * [[http://zil.ipipan.waw.pl/PNET|Triggers for Polish Named Entities]] (M. Baron, L. Manicki, A. Savary),
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/nelexicon|NELexicon]] contains more than 1.4 millions of proper names (M. Marcińczuk, A. Musiał, M. Janicki),
 * [[http://zil.ipipan.waw.pl/Walenty|Walenty]], the Polish Valence Dictionary (E. Hajnicz, W. Kieraś, A. Patejuk, A. Przepiórkowski, F. Skwarski, M. Świdziński, M. Woliński),
 * [[http://zil.ipipan.waw.pl/SGDPV|Syntatic-generative dictionary of Polish verbs]] (K. Polański),
 * [[http://zil.ipipan.waw.pl/SAWA|SAWA]], the Grammatical Lexicon of Warsaw Urban Proper Names (M. Marciniak, C. Heliasz, J. Rabiega-Wiśniewska, P. Sikora, M. Woliński, A. Savary),
 * [[http://zil.ipipan.waw.pl/SEJF|SEJF]], the Grammatical Lexicon of Polish Phraseology (M. Czerepowicka, A. Savary),
 * [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]], the Grammatical Lexicon of Polish Economical Phraseology (F. Makowiecki, A. Savary),
 * [[http://zil.ipipan.waw.pl/WikiTopoPl|WikiTopoPl]], a multilingual lexicon of 155,000 Polish geographical proper names extracted from Wikipedia and their equivalents in Bulgarian, Croatian, English, German, modern Greek, Hungarian, Romanian, Serbian and Slovak (L. Manicki),
 * [[http://zil.ipipan.waw.pl/Prolexbase|Prolexbase 2.0]], a multiligual relational dictionary of proper names in Polish, English and French (M. Baron, B. Bouchou-Markhoff, L. Manicki, D. Maurel, A. Savary, M. Tran),
 * [[http://clip.ipipan.waw.pl/DeepEREntityLibrary|DeepER Entity Library]], a database containing around 900,000 entities, each described by its textual representations in Polish (names) and `WordNet` synsets,
 * [[http://publications.it.p.lodz.pl/2016/word_embeddings/|Word embeddings for Polish - Wikipedia based]] (M. Rogalski, P. Szczepaniak).

== Human-readable dictionaries ==
 * [[http://sgjp.pl|Słownik gramatyczny języka polskiego]],
 * [[http://www.wsjp.pl/|Wielki Słownik Języka Polskiego]],
 * [[http://doroszewski.pwn.pl|Słownik języka polskiego PAN pod red. W. Doroszewskiego]],

 * [[http://pl.wiktionary.org|Wikisłownik]],
 * [[http://www.slownik-online.pl/index.php|Słownik wyrazów obcych i zwrotów obcojęzycznych Władysława Kopalińskiego]],
 * [[http://leksykony.interia.pl/synonim|Słownik synonimów i antonimów Piotra Żmigrodzkiego]],
 * [[http://kpbc.umk.pl/dlibra/publication?id=17781|Słownik polszczyzny XVI wieku]],
 * [[https://sxvii.pl/|Elektroniczny słownik języka polskiego XVII i XVIII wieku]],
 * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-warszawski/| Poliqarp for DjVu search engine]] for J. Karłowicz, A. Kryński, W. Niedźwiedzki. Dictionary of Polish. Warsaw 1900–1927,
 * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-polszczyzny-xvi-wieku/| Poliqarp for DjVu search engine]] for S. Bąk, M. R. Mayenowa, F. Pepłowski (eds.). Dictionary of the 16th century Polish. Wrocław — Warszawa, 1966-???? (work in progress),
 * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-lindego/| Poliqarp for DjVu search engine]] for M. Samuel Bogumił Linde. Dictionary of Polish (2nd edition). Lwów 1854-1861,
 * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-geograficzny/| Poliqarp for DjVu search engine]] for B. Chlebowski, F. Sulimierski, W. Walewski (eds.), The Geographical Dictionary of the Polish Kingdom and other Slavic Countries, Warszawa 1880-1902,
 * [[http://eswil.ijp-pan.krakow.pl/|Edycja elektroniczna Słownika wileńskiego]],
 * PELCRA HASK Collocation Dictionaries generated for [[http://pelcra.pl/hask_pl|Polish]] and [[http://pelcra.pl/hask_en|English]],
 * [[http://clip.ipipan.waw.pl/UkrPolDict|Słownik ukraińsko-polski]] pod redakcją Janusza A. Riegera. Materiały do słownika: Litera „O”.

== Morphological tools and resources ==
 * [[http://sgjp.pl|SGJP]], Grammatical Dictionary of Polish (the list of inflected forms is available with [[http://download.sgjp.pl/morfeusz/|Morfeusz]]),
 * [[http://zil.ipipan.waw.pl/PoliMorf|PoliMorf]], an inflectional dictionary of Polish,
 * [[http://morfeusz.sgjp.pl/|Morfeusz SGJP]], morphological analyser,
 * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni),
 * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss),
 * [[http://duch.mimuw.edu.pl/~kszafran/index.php?option=com_docman&task=cat_view&gid=49&Itemid=93|SAM]], morphological analyser (K. Szafran),
 * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]),
 * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages,
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
 * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus),
 * [[http://gram.neurosoft.pl/|Neurosoft Gram]] (demo of a morphological analyser),
 * [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
 * [[http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa_polski.html|Finite state utilities]] (J. Daciuk),
 * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki/|WCCL]], toolkit for morphosyntactic feature generation (A. Radziszewski, A. Wardyński, T. Śniatowski, P. Kędzia),
 * [[http://lemmatise.ijs.si/Services|LemmaGen]], Multilingual Open Source Lemmatisation for 11 EU languages, including Polish (M. Jursic, T. Erjavec et al.),
 * [[http://zil.ipipan.waw.pl/LemmaPL|LemmaPL]], a lemmatization tool for Polish.

== Taggers ==
 * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish (Decision Trees),
 * [[http://zil.ipipan.waw.pl/PANTERA|PANTERA]], a morphosyntactic tagger for Polish (Transformation-Based Learning),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wmbt/wiki|WMBT]], a morphosyntactic tagger for Polish (Memory-Based Learning),
 * [[http://zil.ipipan.waw.pl/TaCo|TaCo]], a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT]], a morphosyntactic tagger for Polish (Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/Concraft|Concraft]], a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/PoliTa|PoliTa]], a morphosyntactic meta-tagger,
 * [[https://github.com/kwrobel-nlp/krnnt|KRNNT]], a morphological tagger for Polish based on recurrent neural networks,
 * [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]],
 * [[http://clarin.pelcra.pl/tools/tagger|A PoS tagger trained on the 1M NKJP corpus and using Morfeusz]] [[http://ltc.amu.edu.pl/book/papers/PolEval1-3.pdf|(Pęzik & Laskowski 2017)]] with a [[http://clarin.pelcra.pl/apt_pl/?sentences=%5B%22Ala%20lubi%20kota.%22%2C%22Jurek%20ma%20worek.%22%5D|REST API]].

== Parsers, grammars, treebanks ==
 * [[http://zil.ipipan.waw.pl/PDB|PDB 2.0]], a dependency treebank of Polish (A. Wróblewska),
 * [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|PDB-UD]], a version of PDB 2.0 in Universal Dependencies format (A. Wróblewska),
 * [[http://zil.ipipan.waw.pl/PDB/PDBparser|PDBparser]], a Polish dependency parser (A. Wróblewska),
 * Składnica, a hybrid constituency/dependency treebank of Polish,
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|Składnica main page]],
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnicaMWE|SkładnicaMWE]], a constituency version of Składnica with multiword expression annotations (J. Waszczuk, A. Savary),
   * [[http://treebank.nlp.ipipan.waw.pl/|Składnica search engine]] (M. Woliński),
 * [[http://zil.ipipan.waw.pl/plTAG|TAG grammar of Polish]],
 * [[http://zil.ipipan.waw.pl/LFG|POLFIE, an LFG grammar of Polish]]
   * [[http://iness.mozart.ipipan.waw.pl/iness/xle-web|POLFIE as a web service]].
 * [[http://zil.ipipan.waw.pl/%C5%9Awigra|Świgra]], a DCG parser,
   * [[http://swigra.nlp.ipipan.waw.pl/|On-line demo]],
 * Spejd, a shallow parsing and disambiguation system,
   * the [[http://zil.ipipan.waw.pl/Spejd|current version]] of the system,
   * Spejd [[attachment:gramatyka_Spejd_NKJP_1.0.zip|grammar of Polish]] (version 1.0), developed by K. Głowińska within [[http://nkjp.pl/|NKJP]], available on GNU GPL v.3,
   * Spejd [[http://clip.ipipan.waw.pl/SpejdLemmatizingGrammar|grammar of Polish with lemmatisation of Polish nominal syntactic groups]],
   * [[http://zil.ipipan.waw.pl/SEJFEK4Spejd|SEJFEK4Spejd]] - a Spejd grammar version of [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]] and a converter from dictionary to grammar,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system,
 * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]],
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński),
 * [[http://www.site.uottawa.ca/~szpak/oldStuff/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz),
 * [[http://las.aei.polsl.pl/las2/|Serwer LAS / Linguistic Analysis Server]],
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/disaster|Disaster]] (DISAmbiguator and STatistical chunkER) – a Python module for chunking and morphosyntactic disambiguation,
 * [[http://nlp.pwr.wroc.pl/redmine/projects/iobber/wiki|Iobber]], a CRF chunker for Polish,
 * [[http://zil.ipipan.waw.pl/Krzaki|Krzaki (bushes)]], a manually annotated for dependency structure 20k-sentence corpus of Polish.
 * [[http://zil.ipipan.waw.pl/ENIAM|ENIAM]] (W. Jaworski).

== Semantic resources ==
 * [[http://zil.ipipan.waw.pl/Scwad/CDSCorpus|CDSCorpus]], a dataset of 10k pairs of Polish sentences manually annotated for semantic relatedness and entailment (A. Wróblewska)
 * [[http://git.nlp.ipipan.waw.pl/Scwad/SCWAD-probing-data|Probing datasets]], Polish and English probing datasets for linguistic verification of sentence embeddings (A. Wróblewska)

== Sentiment analysis ==
 * [[http://zil.ipipan.waw.pl/SlownikWydzwieku|Polish sentiment dictionary]], with sentiment scores computed using supervised methods (A. Wawer),
 * [[http://zil.ipipan.waw.pl/LCM-PL|Polish Linguistic Category Model]] following a typology of verb categorization in terms of their abstractness, also a tool to measure language abstraction (A. Wawer),
 * [[http://zil.ipipan.waw.pl/TreebankWydzwieku|Polish dependency treebank with sentiment annotations]] (A. Wawer),
 * [[http://zil.ipipan.waw.pl/HateSpeech|HateSpeech corpus]], 2000 manually annotated documents representing various types and degrees of offensive language expressed toward minorities,
 * [[http://zil.ipipan.waw.pl/Korpus%20Szczerosci|Sincerity Corpus (Korpus Szczerości)]], a collection of fake and real reviews,
 * you can also test Sentipejd – sentiment analysis tool in the [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]] (please select a tagger first).

== Coreference ==
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a 500 M corpus of general nominal coreference in Polish (M. Ogrodniczuk),
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceTools|Polish Coreference Tools]], a suite of Polish coreference resolution tools, created as part of the [[http://zil.ipipan.waw.pl/CORE|CORE project]].

== Speech analysis and synthesis tools ==
 * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky),
 * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo),
 * [[http://techmo.pl/index.php?option=com_content&view=article&id=54&Itemid=166&lang=pl|Techmo]] TTS demo (Techmo),
 * [[http://www.nuance.com/landing-pages/playground/Vocalizer_Demo2/vocalizer_modal.html?demo=true|Vocalizer]], commercial text-to-speech system (Nuance),
 * [[http://www.acapela-group.com/text-to-speech-interactive-demo.html|Acapela]], text to speech demo,
 * [[http://www.syntezamowy.pjwstk.edu.pl/index.html|Synteza mowy polskiej]], automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),
 * [[http://www.staff.amu.edu.pl/~fonetyka/synteza/index.htm|System syntezy mowy ciągłej]] (G. Demenko, S. Grocholewski),
 * [[http://www.tcts.fpms.ac.be/synthesis/mbrola/|Polish MBROLA database]] (K. Szklanny, K. Marasek),
 * [[http://www.neurosoft.pl/?page_name=Produkty_SynTalk|SynTalk]], commercial speech synthesis system (!NeuroSoft),
 * [[http://www.primespeech.pl/|PrimeSpeech]], commercial speech recognition systems,
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:ortfon|OrtFon]], phonetic transcriber (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:asr|Sarmata]], automatic speech recognition system for Polish (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:anotator|Anotator]], speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:spkreco|System rozpoznawania mówcy]] (AGH DSP).

== Machine translation demonstrations ==
 * [[http://itranslate4.eu/|iTranslate4.eu]] (multiple languages, allows comparing translation engines),
 * [[http://www.microsofttranslator.com/|Bing Translator]] (multilingual),
 * [[http://translate.google.com/|Google Translate]] (multilingual),
 * [[http://www.tranexp.com/|InterTran]] (multilingual),
 * [[http://www.poltran.com/|LingvoBit]] (EN-PL-EN),
 * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more),
 * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV),
 * [[http://thetos.polsl.pl/|Thetos]] (PL-Sign language).

== Summarizers ==
 * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (A. Dudczak),
 * [[http://las.aei.polsl.pl/PolSum/#/Home|PolSum]] (S. Kulików),
 * [[http://clip.ipipan.waw.pl/Summar|Summar]] (Ł. Pawluczuk),
 * [[http://clip.ipipan.waw.pl/Summarizer|Summarizer]] (J. Świetlicka),
 * you can also test Lakon, Open Text Summarizer and Summarizer in [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]]
 * and take a look at the [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]].

== Diacritization ==
 * [[http://www.gzegzolka.com/poliszynel/|Poliszynel]] (P. Sawicki),
 * [[http://www.spolszcz.pl/|spolszcz.pl]] (P. Sawicki),
 * [[http://www.polszczyzna.info/polonizator|Polonizator]] (TiP),
 * [[http://slowniki.zoni.pl/?s=ogonki|Polonizer]],
 * [[http://galaxy.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/man/fsa_accent.1.html|fsa_accent]] (J. Daciuk),
 * [[http://wm.ite.pl/proj/pliterki/index.html|pliterki]] (W. Muła).

== Named entity recognition ==
 * [[http://zil.ipipan.waw.pl/Nerf|Nerf]], a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),
 * [[http://nlp.pwr.wroc.pl/narzedzia-i-zasoby/narzedzia/liner2|Liner2]], named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),
 * [[https://clarin-pl.eu/dspace/handle/11321/302|TIMEX]], a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

== Multiword expression software ==
 * [[http://zil.ipipan.waw.pl/TermoPL|TermoPL]], multiword expression extraction tool,
 * [[http://multiword.sourceforge.net/sharedtaskresults2018/|VMWE identifiers]], systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish,
 * [[https://mwedemonstrator.atilf.fr/mwetools/accueil/|PARSEME-FR demonstrator]], including the ATILF-LLF multiword expression identifier for Polish,

== Aggregating services ==
 * [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]], a sample interface for running NLP Web services for Polish (see also [[http://redmine.nlp.ipipan.waw.pl/redmine/projects/multiserwis/wiki/Usage|usage]] and [[http://redmine.nlp.ipipan.waw.pl/redmine/projects/multiserwis/wiki/InOut|format]]),
 * [[http://ws.clarin-pl.eu/|Online demos of tools for processing Polish texts]] (CLARIN-PL),
 * [[http://psi-toolkit.wmi.amu.edu.pl/index.html|PSI-Toolkit]], a chain of publicly available tools for automatic processing of Polish.

== Other ==
 * [[https://play.google.com/store/apps/details?id=com.pwr.plwordnet|Mobile plWordNet]], free mobile application for plWordNet browsing (J. Kocoń), /* * [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński),*/
 * [[http://zil.ipipan.waw.pl/WSDDE|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),
 * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik),
 * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://sourceforge.net/p/languagetool/code/HEAD/tree/trunk/languagetool/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx?format=raw|LanguageTool project]], see [[http://zil.ipipan.waw.pl/Segment|here]] for short instructions on how to use the tool),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/toki/wiki|Toki]], a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),
 * [[http://poleng.pl/translatica-pl.srx|Translatica SRX sentence segmentation rules for Polish (LGPL)]],
 * [[http://psi.amu.edu.pl/en/index.php?title=SyMGIZA%2B%2B|SyMGIZA++]], an extension of Giza++ that computes symmetric word alignment models,
 * [[http://hipisek.pl|Hipisek]], an experimental question answering system (M. Walas),
 * [[https://bitbucket.org/jsbien/ndt|Narzędzia dygitalizacji tekstów]], Poliqarp for !DjVu i inne programy,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/fextor|Fextor]], a feature extraction framework,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/lexcsd|LexCSD]], a system for semi-automatic sense disambiguation,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/supermatrix|SuperMatrix]], a general tool for lexical semantic knowledge acquisition,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/wordnetloom|WordnetLoom]], an wordnet editor application,
 * [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]], tool for the creation of electronic inflectional dictionaries of multi-word units,
 * [[http://zil.ipipan.waw.pl/CorpCor|CorpCor]], a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).
 * [[http://ws.clarin-pl.eu/demo/stylo2.html|Stylo 2]], stylometry demo,
 * [[http://clip.ipipan.waw.pl/DeepEvents|DeepEvents]], event extraction in Polish, based on deep neural networks.
 * [[http://dsmodels.nlp.ipipan.waw.pl/sim1.html|Word similarity]], calculation of the similarity of words based on word embeddings, on-line service,
 * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),
 * [[http://zil.ipipan.waw.pl/SpacyPL|SpacyPL]], Polish language models and resources for [[https://spacy.io|Spacy]]
 * [[https://jasnopis.pl/|Jasnopis]], analyzer of text obscurity level
 * [[http://zil.ipipan.waw.pl/Scwad/AIDe|AIDe]], corpus of image descriptions in Polish (A. Wróblewska)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora of contemporary Polish

Written corpora of historical Polish

Spoken corpora

Language models

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Semantic resources

 • CDSCorpus, a dataset of 10k pairs of Polish sentences manually annotated for semantic relatedness and entailment (A. Wróblewska)

 • Probing datasets, Polish and English probing datasets for linguistic verification of sentence embeddings (A. Wróblewska)

Sentiment analysis

Coreference

Speech analysis and synthesis tools

Machine translation demonstrations

Summarizers

Diacritization

Named entity recognition

 • Nerf, a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),

 • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

 • TIMEX, a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

Multiword expression software

 • TermoPL, multiword expression extraction tool,

 • VMWE identifiers, systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish,

 • PARSEME-FR demonstrator, including the ATILF-LLF multiword expression identifier for Polish,

Aggregating services

Other

 • Mobile plWordNet, free mobile application for plWordNet browsing (J. Kocoń),

 • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

 • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

 • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

 • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

 • Translatica SRX sentence segmentation rules for Polish (LGPL),

 • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

 • Hipisek, an experimental question answering system (M. Walas),

 • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

 • Fextor, a feature extraction framework,

 • LexCSD, a system for semi-automatic sense disambiguation,

 • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

 • WordnetLoom, an wordnet editor application,

 • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

 • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).

 • Stylo 2, stylometry demo,

 • DeepEvents, event extraction in Polish, based on deep neural networks.

 • Word similarity, calculation of the similarity of words based on word embeddings, on-line service,

 • Baltoslav, with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),

 • SpacyPL, Polish language models and resources for Spacy

 • Jasnopis, analyzer of text obscurity level

 • AIDe, corpus of image descriptions in Polish (A. Wróblewska)