= Language Tools and Resources for Polish = This page contains a list of ''publicly available'' language tools and resources. /* * [[attachment:NKJP-PodkorpusMilionowy-1.0.tgz]], the manually annotated 1-million worsd subcorpus of the NJKP, available on GNU GPL v.3, */ /* * [[attachment:NKJP-PodkorpusMilionowy-1.1.tgz]], the manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3, */ /* * [[attachment:NKJP-PodkorpusMilionowy-1.0-poliqarp-bin.tgz]], the binary version of corpus to be used with standalone Poliqarp tool. */ == Written corpora of contemporary Polish == * [[NationalCorpusOfPolish|National Corpus of Polish]] (NKJP), * [[http://korpus.pwn.pl/|PWN Corpus]], * [[http://nfjp.pl/|National Photocorpus of Polish]] (NFJP), * [[http://poliqarp.wbl.klf.uw.edu.pl/|Dictionaries as Corpora]], * [[PPC|Polish Parliamentary Corpus]], * [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]], * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://dl.psnc.pl/activities/projekty/impact/results/| IMPACT ground-truth data]] for selected Polish historical documents from PIONIER Digital Libraries Federation, * Now available also as corpora in the Poliqarp for !DjVu [[http://poliqarp.wbl.klf.uw.edu.pl|search engine]], * [[http://nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/kpwr|KPWr]], Polish Corpus of Wrocław University of Technology, collection of documents available on Creative Common license annotated with syntactic chunks, proper names, semantic relations, anaphora and word senses, * [[http://www.pcsn.uni.wroc.pl/|Polish Corpus of Suicide Notes]], * [[PolishWikipediaCorpus|Polish Wikipedia Corpus]], * [[http://zil.ipipan.waw.pl/gpwEcono|gpwEcono]], a corpus of stock market reports, with manual word sense annotation, * [[http://zil.ipipan.waw.pl/plWikiEcono|plWikiEcono]], a corpus of Polish Wikipedia articles from the domain of economy, * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a corpus of Polish coreference relations, created as part of the [[http://core.ipipan.waw.pl/About|CORE project]], * [[http://argumentacja.pdg.pl/argdbpl/|ArgDB-pl]], a Polish corpus of arguments in natural contexts, * [[http://www.staff.amu.edu.pl/~romang/wiki_errors_pl.php|PIEWiC]], a Polish corpus of errors automatically extracted from Wikipedia revisions, * [[http://clip.ipipan.waw.pl/PolEval|PolEval]], corpora and other text resources created for [[http://www.poleval.pl|PolEval]] shared tasks, * [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]], annotated manually for verbal multiword expressions in 20 languages including Polish, used in the [[http://multiword.sourceforge.net/sharedtask2018/|PARSEME shared task 1.1]]; the Polish subcorpus is aligned with automatic dependency annotations in the [[http://universaldependencies.org/guidelines.html|UD]] format (A. Savary), * [[http://clip.ipipan.waw.pl/MweLitRead|MweLitRead]] - a corpus of literal readings of Polish verbal MWEs stemming from the [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]] (A. Savary, S. Cordeiro), * [[http://pelcra.pl/plec/downloads|PELCRA Learner English Corpus]] (PLEC), * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine, * [[http://synamet.polon.uw.edu.pl/|Microcorpus of Synesthetic Metaphors]]. == Written corpora of historical Polish == * [[http://scriptores.pl/efontes/|eFontes Mediae et Infimae Latinitatis Polonorum]] (1000–1550, IJP PAN) * [[https://www.ijp-pan.krakow.pl/publikacje-elektroniczne/korpus-tekstow-staropolskich|Corpus of old Polish (up to 1500)]] (IJP PAN) * [[http://stnt.ijp.pan.pl/|15. century New Testament translations]] (IJP PAN) * [[https://szukajwslownikach.uw.edu.pl/IMPACT_GT_1/|IMPACT project corpus]] (1570–1756, KLF UW) * [[http://spxvi.edu.pl/korpus/|Corpus of 16. century Polish]] (IBL PAN) * [[http://fedora.clarin-d.uni-saarland.de/poldilemma/|PolDiLemma]], the Middle Polish Diachrone Lemmatised Corpus (16–18th c., R. Meyer) * [[http://rhssl1.uni-regensburg.de/SlavKo/korpus/poldi|PolDi]], a Polish Diachronic Online Corpus (R. Meyer) * [[http://korba.edu.pl|KORBA]], electronic corpus of 17th and 18th century Polish texts (1601–1772, IJP PAN) * [[http://www.f19.uw.edu.pl/|Corpus of the 19. century Polish]] (1830–1918, IJP UW) * [[http://korpus19.nlp.ipipan.waw.pl/|Manually annotated and transcribed corpus of the 19th century Polish]], (1830–1918, IPI PAN) * [[http://chronopress.clarin-pl.eu/|ChronoPress]], corpus of press texts from 1945–1954 (A. Pawłowski), * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński). == Corpus-related tools and resources == * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine (please see also [[http://nlp.ipipan.waw.pl/Poliqarp/|the beta version of Poliqarp 1.1]] and [[http://clip.ipipan.waw.pl/Poliqarp|1.3]] with statistical extensions and [[http://liszt.ipipan.waw.pl/|several corpora indexed with Poliqarp 2]]), * [[http://zil.ipipan.waw.pl/Anotatornia|Anotatornia]], a system for multi-level manual annotation of corpora, * [[http://zil.ipipan.waw.pl/Anotatornia2|Anotatornia2]], new version of Anotatornia geared towards annotation of historical corpora, * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/inforex|Inforex]], a web-based system designed for managing and annotating text corpora on the semantic level, * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer, * [[http://korpusy.net/|korpusy.net]], a corpus research-related website (B. Gałkowski), * [[http://zil.ipipan.waw.pl/Korpusomat|Korpusomat]], a tool for creation of searchable own corpora (Ł. Kobyliński, W. Kieraś). == Spoken corpora == * The PELCRA conversational corpus of Polish: approx. 2.2 million words of casual conversational spoken Polish collected and processed in the years 2001-2015 in a number of research projects, including PELCRA, NKJP, CESAR and CLARIN-PL. Available under CC-BY-NC. All transcriptions can be accessed through the [[http://spokes.clarin-pl.eu|Spokes web interface]] and programmatically through [[http://clarin.pelcra.pl/apidocs/spokes| a REST API]]. * [[http://clip.ipipan.waw.pl/LUNA|The annotated corpus of spoken dialogues]] (LUNA project, corpus data available at the end of the page), * [[http://www.dsp.agh.edu.pl/doku.php?id=en:resources:korpusmowy|AGH speech corpus]], around 9 hours, word-annotated Polish speech corpus (AGH DSP), * [[http://www.dsp.agh.edu.pl/doku.php?id=en:resources:korpusav|Audiovideo corpus]] of Polish speech (AGH DSP), * [[http://nkjp.uni.lodz.pl/spoken.jsp|NKJP search engine for spoken-conversational data]], * [[http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1164|Acoustic database for Polish unit selection speech synthesis]] (ELRA resources), * [[http://catalog.elra.info/product_info.php?cPath=37_39&products_id=1168|Acoustic database for Polish concatenative speech synthesis]] (ELRA resources), * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:korpusemo|Corpus of emotions in speech]] (AGH DSP). == Language models == * [[http://zil.ipipan.waw.pl/NKJPNGrams|N-grams from the balanced subcorpus of the National Corpus of Polish]], * [[http://mozart.ipipan.waw.pl/~axw/models/|Distributional semantic models]] trained on orthographical, lemmatized word forms (A. Wawer), * Even more [[http://dsmodels.nlp.ipipan.waw.pl|distributional semantic models]] based on NKJP (A. Mykowiecka, M. Marciniak, P. Rychlik), * [[http://publications.it.p.lodz.pl/2016/word_embeddings/|Wikipedia-based word embeddings for Polish]] (M. Rogalski, P. Szczepaniak), * [[https://wikipedia2vec.github.io/wikipedia2vec/pretrained/|Wikipedia2Vec – pretrained embeddings for Polish]] (I. Yamada, A. Asai, H. Shindo, H. Takeda., Y Takefuji), * [[https://github.com/deepmipt/Slavic-BERT-NER|Slavic BERT NER]] (see also [[http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert|Deep Pavlov website]]), * [[https://github.com/sdadas/polish-nlp-resources|RoBERTa]] and links many other useful resources (S. Dadas), * [[https://github.com/kldarek/polbert|Polbert]] (D. Kłeczek), == Parallel corpora and translation memories == * [[http://parasolcorpus.org/|ParaSol]], a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), * [[http://www.korpus.cz/intercorp/?req=page:info|InterCorp]], a multilingual parallel corpus, * [[http://corpus.leeds.ac.uk/internet.html|Leeds collection of Internet corpora]], * [[http://korpus.hiztegia.org/|LAGUN corpus]], * [[http://pelcra.pl/new/cesar|PELCRA Parallel corpora]], a collection of downloadable parallel corpora available under the CC-BY and CC-BY-NC licensed developed by the PELCRA team * [[http://paralela.clarin-pl.eu|Paralela Polish-English corpus]] * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]], * [[http://psi.amu.edu.pl/en/index.php?title=Parallel_Corpora|PSI collection of parallel corpora]], a growing collection of parallel corpora pairing Polish with other european languages, * [[http://www.pol-ros.polon.uw.edu.pl/|Polish-Russian Parallel Corpus]], * [[http://mymemory.translated.net/|MyMemory]], freely available multilingual TM, * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association, * [[http://glosbe.com/|Glosbe]], an open source TM. * [[https://github.com/poethan/AlphaMWE|AlphaMWE]] Parallel English-Chinese, English-Polish and English-German parallel corpus annotated with multiword expressions == Machine-readable dictionaries == * [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet, Słowosieć]] (M. Piasecki), * [[http://www.ltc.amu.edu.pl/polnet/|POLNET, another Polish Wordnet]] (Z. Vetulani), * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], słownik synonimów – a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display, * [[Nowy_slownik_angielsko-polski|Nowy słownik angielsko-polski]] (T. Piotrowski, Z. Saloni), * [[http://zil.ipipan.waw.pl/OpenCYCPL|Polish OpenCYC]] (A. Pohl), * [[http://www.slowniki.org.pl/pol.html|Polish machine-generated dictionaries]], available on Creative Commons (J. Kazojć), * [[http://futrega.org/etc/nazwiska.zip|List of all Polish surnames]], licence unknown, see [[http://futrega.org/etc/nazwiska.html|further information on this resource]], * [[http://clip.ipipan.waw.pl/Gazetteer|Gazetteer for Polish Named Entities]] (A. Savary, M. Lenart, J. Piskorski), * [[http://zil.ipipan.waw.pl/PNET|Triggers for Polish Named Entities]] (M. Baron, L. Manicki, A. Savary), * [[http://www.nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/nelexicon|NELexicon]] contains more than 1.4 millions of proper names (M. Marcińczuk, A. Musiał, M. Janicki), * [[http://zil.ipipan.waw.pl/Walenty|Walenty]], the Polish Valence Dictionary (E. Hajnicz, W. Kieraś, A. Patejuk, A. Przepiórkowski, F. Skwarski, M. Świdziński, M. Woliński), * [[http://zil.ipipan.waw.pl/SGDPV|Syntatic-generative dictionary of Polish verbs]] (K. Polański), * [[http://zil.ipipan.waw.pl/SAWA|SAWA]], the Grammatical Lexicon of Warsaw Urban Proper Names (M. Marciniak, C. Heliasz, J. Rabiega-Wiśniewska, P. Sikora, M. Woliński, A. Savary), * [[http://zil.ipipan.waw.pl/SEJF|SEJF]], the Grammatical Lexicon of Polish Phraseology (M. Czerepowicka, A. Savary), * [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]], the Grammatical Lexicon of Polish Economical Phraseology (F. Makowiecki, A. Savary), * [[http://zil.ipipan.waw.pl/WikiTopoPl|WikiTopoPl]], a multilingual lexicon of 155,000 Polish geographical proper names extracted from Wikipedia and their equivalents in Bulgarian, Croatian, English, German, modern Greek, Hungarian, Romanian, Serbian and Slovak (L. Manicki), * [[http://zil.ipipan.waw.pl/Prolexbase|Prolexbase 2.0]], a multiligual relational dictionary of proper names in Polish, English and French (M. Baron, B. Bouchou-Markhoff, L. Manicki, D. Maurel, A. Savary, M. Tran), * [[http://clip.ipipan.waw.pl/DeepEREntityLibrary|DeepER Entity Library]], a database containing around 900,000 entities, each described by its textual representations in Polish (names) and `WordNet` synsets. == Human-readable dictionaries == * [[http://sgjp.pl|Słownik gramatyczny języka polskiego]], * [[http://www.wsjp.pl/|Wielki Słownik Języka Polskiego]], * [[http://doroszewski.pwn.pl|Słownik języka polskiego PAN pod red. W. Doroszewskiego]], * [[http://pl.wiktionary.org|Wikisłownik]], * [[http://www.slownik-online.pl/index.php|Słownik wyrazów obcych i zwrotów obcojęzycznych Władysława Kopalińskiego]], * [[http://leksykony.interia.pl/synonim|Słownik synonimów i antonimów Piotra Żmigrodzkiego]], * [[http://kpbc.umk.pl/dlibra/publication?id=17781|Słownik polszczyzny XVI wieku]], * [[https://sxvii.pl/|Elektroniczny słownik języka polskiego XVII i XVIII wieku]], * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-warszawski/| Poliqarp for DjVu search engine]] for J. Karłowicz, A. Kryński, W. Niedźwiedzki. Dictionary of Polish. Warsaw 1900–1927, * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-polszczyzny-xvi-wieku/| Poliqarp for DjVu search engine]] for S. Bąk, M. R. Mayenowa, F. Pepłowski (eds.). Dictionary of the 16th century Polish. Wrocław — Warszawa, 1966-???? (work in progress), * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-lindego/| Poliqarp for DjVu search engine]] for M. Samuel Bogumił Linde. Dictionary of Polish (2nd edition). Lwów 1854-1861, * [[http://poliqarp.wbl.klf.uw.edu.pl/slownik-geograficzny/| Poliqarp for DjVu search engine]] for B. Chlebowski, F. Sulimierski, W. Walewski (eds.), The Geographical Dictionary of the Polish Kingdom and other Slavic Countries, Warszawa 1880-1902, * [[http://eswil.ijp-pan.krakow.pl/|Edycja elektroniczna Słownika wileńskiego]], * PELCRA HASK Collocation Dictionaries generated for [[http://pelcra.pl/hask_pl|Polish]] and [[http://pelcra.pl/hask_en|English]], * [[http://clip.ipipan.waw.pl/UkrPolDict|Słownik ukraińsko-polski]] pod redakcją Janusza A. Riegera. Materiały do słownika: Litera „O”. == Morphological tools and resources == * [[http://sgjp.pl|SGJP]], Grammatical Dictionary of Polish (the list of inflected forms is available with [[http://morfeusz.sgjp.pl/download/|Morfeusz]]), * [[http://zil.ipipan.waw.pl/PoliMorf|PoliMorf]], an inflectional dictionary of Polish, * [[http://morfeusz.sgjp.pl/|Morfeusz SGJP]], morphological analyser, * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni), * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss), * [[http://duch.mimuw.edu.pl/~kszafran/index.php?option=com_docman&task=cat_view&gid=49&Itemid=93|SAM]], morphological analyser (K. Szafran), * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]), * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages, * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski), * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus), * [[http://gram.neurosoft.pl/|Neurosoft Gram]] (demo of a morphological analyser), * [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka), * [[http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa_polski.html|Finite state utilities]] (J. Daciuk), * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki), * [[http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki/|WCCL]], toolkit for morphosyntactic feature generation (A. Radziszewski, A. Wardyński, T. Śniatowski, P. Kędzia), * [[http://lemmatise.ijs.si/Services|LemmaGen]], Multilingual Open Source Lemmatisation for 11 EU languages, including Polish (M. Jursic, T. Erjavec et al.), * [[http://zil.ipipan.waw.pl/LemmaPL|LemmaPL]], a lemmatization tool for Polish. == Taggers == * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish (Decision Trees), * [[http://zil.ipipan.waw.pl/PANTERA|PANTERA]], a morphosyntactic tagger for Polish (Transformation-Based Learning), * [[http://nlp.pwr.wroc.pl/redmine/projects/wmbt/wiki|WMBT]], a morphosyntactic tagger for Polish (Memory-Based Learning), * [[http://zil.ipipan.waw.pl/TaCo|TaCo]], a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish), * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT]], a morphosyntactic tagger for Polish (Conditional Random Fields), * [[http://zil.ipipan.waw.pl/Concraft|Concraft]], a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields), * [[http://zil.ipipan.waw.pl/PoliTa|PoliTa]], a morphosyntactic meta-tagger, * [[https://github.com/kwrobel-nlp/krnnt|KRNNT]], a morphological tagger for Polish based on recurrent neural networks, * [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]], * [[http://clarin.pelcra.pl/tools/tagger|A PoS tagger trained on the 1M NKJP corpus and using Morfeusz]] [[http://ltc.amu.edu.pl/book/papers/PolEval1-3.pdf|(Pęzik & Laskowski 2017)]] with a [[http://clarin.pelcra.pl/apt_pl/?sentences=%5B%22Ala%20lubi%20kota.%22%2C%22Jurek%20ma%20worek.%22%5D|REST API]]. == Parsers, grammars, treebanks == * [[http://zil.ipipan.waw.pl/PDB|PDB 2.0]], a dependency treebank of Polish (A. Wróblewska), * [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|PDB-UD]], a version of PDB 2.0 in Universal Dependencies format (A. Wróblewska), * [[http://zil.ipipan.waw.pl/PDB/PDBparser|PDBparser]], a Polish dependency parser (A. Wróblewska), * Składnica, a hybrid constituency/dependency treebank of Polish, * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|Składnica main page]], * [[http://zil.ipipan.waw.pl/Sk%C5%82adnicaMWE|SkładnicaMWE]], a constituency version of Składnica with multiword expression annotations (J. Waszczuk, A. Savary), * [[http://treebank.nlp.ipipan.waw.pl/|Składnica search engine]] (M. Woliński), * [[http://zil.ipipan.waw.pl/plTAG|TAG grammar of Polish]], * [[http://zil.ipipan.waw.pl/LFG|POLFIE, an LFG grammar of Polish]] * [[http://iness.mozart.ipipan.waw.pl/iness/xle-web|POLFIE as a web service]]. * [[http://zil.ipipan.waw.pl/%C5%9Awigra|Świgra]], a DCG parser, * [[http://swigra.nlp.ipipan.waw.pl/|On-line demo]], * Spejd, a shallow parsing and disambiguation system, * the [[http://zil.ipipan.waw.pl/Spejd|current version]] of the system, * Spejd [[attachment:gramatyka_Spejd_NKJP_1.0.zip|grammar of Polish]] (version 1.0), developed by K. Głowińska within [[http://nkjp.pl/|NKJP]], available on GNU GPL v.3, * Spejd [[http://clip.ipipan.waw.pl/SpejdLemmatizingGrammar|grammar of Polish with lemmatisation of Polish nominal syntactic groups]], * [[http://zil.ipipan.waw.pl/SEJFEK4Spejd|SEJFEK4Spejd]] - a Spejd grammar version of [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]] and a converter from dictionary to grammar, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system, * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]], * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński), * [[http://www.site.uottawa.ca/~szpak/oldStuff/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz), * [[http://las.aei.polsl.pl/las2/|Serwer LAS / Linguistic Analysis Server]], * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/disaster|Disaster]] (DISAmbiguator and STatistical chunkER) – a Python module for chunking and morphosyntactic disambiguation, * [[http://nlp.pwr.wroc.pl/redmine/projects/iobber/wiki|Iobber]], a CRF chunker for Polish, * [[http://zil.ipipan.waw.pl/Krzaki|Krzaki (bushes)]], a manually annotated for dependency structure 20k-sentence corpus of Polish. * [[http://zil.ipipan.waw.pl/ENIAM|ENIAM]] (W. Jaworski). == Semantic resources == * [[http://zil.ipipan.waw.pl/Scwad/CDSCorpus|CDSCorpus]], a dataset of 10k pairs of Polish sentences manually annotated for semantic relatedness and entailment (A. Wróblewska) * [[http://git.nlp.ipipan.waw.pl/Scwad/SCWAD-probing-data|Probing datasets]], Polish and English probing datasets for linguistic verification of sentence embeddings (A. Wróblewska) == Sentiment analysis, opinion mining == * [[http://zil.ipipan.waw.pl/SlownikWydzwieku|Polish sentiment dictionary]], with sentiment scores computed using supervised methods (A. Wawer), * [[http://zil.ipipan.waw.pl/LCM-PL|Polish Linguistic Category Model]] following a typology of verb categorization in terms of their abstractness, also a tool to measure language abstraction (A. Wawer), * [[http://zil.ipipan.waw.pl/TreebankWydzwieku|Polish dependency treebank with sentiment annotations]] (A. Wawer), * [[http://zil.ipipan.waw.pl/HateSpeech|HateSpeech corpus]], 2000 manually annotated documents representing various types and degrees of offensive language expressed toward minorities, * [[http://zil.ipipan.waw.pl/Korpus%20Szczerosci|Sincerity Corpus (Korpus Szczerości)]], a collection of fake and real reviews, * you can also test Sentipejd – sentiment analysis tool in the [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]] (please select a tagger first), * [[https://exp.lobi.nencki.gov.pl/nawl-analysis|Nencki Affective Word List]]. == Coreference == * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a 500 M corpus of general nominal coreference in Polish (M. Ogrodniczuk), * [[http://zil.ipipan.waw.pl/PolishCoreferenceTools|Polish Coreference Tools]], a suite of Polish coreference resolution tools, created as part of the [[http://zil.ipipan.waw.pl/CORE|CORE project]]. == Speech analysis and synthesis tools == * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky), * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo), * [[http://techmo.pl/index.php?option=com_content&view=article&id=54&Itemid=166&lang=pl|Techmo]] TTS demo (Techmo), * [[http://www.nuance.com/landing-pages/playground/Vocalizer_Demo2/vocalizer_modal.html?demo=true|Vocalizer]], commercial text-to-speech system (Nuance), * [[http://www.acapela-group.com/text-to-speech-interactive-demo.html|Acapela]], text to speech demo, * [[http://www.syntezamowy.pjwstk.edu.pl/index.html|Synteza mowy polskiej]], automatic speech recognition and speech synthesis demos, with background information (K. Szklanny), * [[http://www.staff.amu.edu.pl/~fonetyka/synteza/index.htm|System syntezy mowy ciągłej]] (G. Demenko, S. Grocholewski), * [[http://www.tcts.fpms.ac.be/synthesis/mbrola/|Polish MBROLA database]] (K. Szklanny, K. Marasek), * [[http://www.neurosoft.pl/?page_name=Produkty_SynTalk|SynTalk]], commercial speech synthesis system (!NeuroSoft), * [[http://www.primespeech.pl/|PrimeSpeech]], commercial speech recognition systems, * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:ortfon|OrtFon]], phonetic transcriber (AGH DSP), * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:asr|Sarmata]], automatic speech recognition system for Polish (AGH DSP), * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:anotator|Anotator]], speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP), * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:spkreco|System rozpoznawania mówcy]] (AGH DSP). == Machine translation demonstrations == * [[http://itranslate4.eu/|iTranslate4.eu]] (multiple languages, allows comparing translation engines), * [[http://www.microsofttranslator.com/|Bing Translator]] (multilingual), * [[http://translate.google.com/|Google Translate]] (multilingual), * [[http://www.tranexp.com/|InterTran]] (multilingual), * [[http://www.poltran.com/|LingvoBit]] (EN-PL-EN), * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more), * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV), * [[http://thetos.polsl.pl/|Thetos]] (PL-Sign language). == Summarizers == * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (A. Dudczak), * [[http://las.aei.polsl.pl/PolSum/#/Home|PolSum]] (S. Kulików), * [[http://clip.ipipan.waw.pl/Summar|Summar]] (Ł. Pawluczuk), * [[http://clip.ipipan.waw.pl/Summarizer|Summarizer]] (J. Świetlicka), * you can also test Lakon, Open Text Summarizer and Summarizer in [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]] * and take a look at the [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]]. == Diacritization == * [[http://www.gzegzolka.com/poliszynel/|Poliszynel]] (P. Sawicki), * [[http://www.spolszcz.pl/|spolszcz.pl]] (P. Sawicki), * [[http://www.polszczyzna.info/polonizator|Polonizator]] (TiP), * [[http://slowniki.zoni.pl/?s=ogonki|Polonizer]], * [[http://galaxy.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/man/fsa_accent.1.html|fsa_accent]] (J. Daciuk), * [[http://wm.ite.pl/proj/pliterki/index.html|pliterki]] (W. Muła). == Named entity recognition == * [[http://zil.ipipan.waw.pl/Nerf|Nerf]], a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk), * [[http://nlp.pwr.wroc.pl/narzedzia-i-zasoby/narzedzia/liner2|Liner2]], named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki), * [[https://clarin-pl.eu/dspace/handle/11321/302|TIMEX]], a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk). == Multiword expression software == * [[http://zil.ipipan.waw.pl/TermoPL|TermoPL]], multiword expression extraction tool, * [[http://multiword.sourceforge.net/sharedtaskresults2018/|VMWE identifiers]], systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish, * [[https://mwedemonstrator.atilf.fr/mwetools/accueil/|PARSEME-FR demonstrator]], including the ATILF-LLF multiword expression identifier for Polish, == Aggregating services == * [[http://multiservice.nlp.ipipan.waw.pl/|Multiservice]], a sample interface for running NLP Web services for Polish (see also [[http://redmine.nlp.ipipan.waw.pl/redmine/projects/multiserwis/wiki/Usage|usage]] and [[http://redmine.nlp.ipipan.waw.pl/redmine/projects/multiserwis/wiki/InOut|format]]), * [[http://ws.clarin-pl.eu/|Online demos of tools for processing Polish texts]] (CLARIN-PL), * [[http://psi-toolkit.wmi.amu.edu.pl/index.html|PSI-Toolkit]], a chain of publicly available tools for automatic processing of Polish. == Other == * [[https://play.google.com/store/apps/details?id=com.pwr.plwordnet|Mobile plWordNet]], free mobile application for plWordNet browsing (J. Kocoń), /* * [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński),*/ * [[http://zil.ipipan.waw.pl/WSDDE|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik), * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://sourceforge.net/p/languagetool/code/HEAD/tree/trunk/languagetool/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx?format=raw|LanguageTool project]], see [[http://zil.ipipan.waw.pl/Segment|here]] for short instructions on how to use the tool), * [[http://nlp.pwr.wroc.pl/redmine/projects/toki/wiki|Toki]], a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski), * [[http://poleng.pl/translatica-pl.srx|Translatica SRX sentence segmentation rules for Polish (LGPL)]], * [[http://psi.amu.edu.pl/en/index.php?title=SyMGIZA%2B%2B|SyMGIZA++]], an extension of Giza++ that computes symmetric word alignment models, * [[http://hipisek.pl|Hipisek]], an experimental question answering system (M. Walas), * [[https://bitbucket.org/jsbien/ndt|Narzędzia dygitalizacji tekstów]], Poliqarp for !DjVu i inne programy, * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/fextor|Fextor]], a feature extraction framework, * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/lexcsd|LexCSD]], a system for semi-automatic sense disambiguation, * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/supermatrix|SuperMatrix]], a general tool for lexical semantic knowledge acquisition, * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/wordnetloom|WordnetLoom]], an wordnet editor application, * [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]], tool for the creation of electronic inflectional dictionaries of multi-word units, * [[http://zil.ipipan.waw.pl/CorpCor|CorpCor]], a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP). * [[http://ws.clarin-pl.eu/demo/stylo2.html|Stylo 2]], stylometry demo, * [[http://clip.ipipan.waw.pl/DeepEvents|DeepEvents]], event extraction in Polish, based on deep neural networks. * [[http://dsmodels.nlp.ipipan.waw.pl/sim1.html|Word similarity]], calculation of the similarity of words based on word embeddings, on-line service, * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.), * [[http://zil.ipipan.waw.pl/SpacyPL|SpacyPL]], Polish language models and resources for [[https://spacy.io|Spacy]] * [[https://jasnopis.pl/|Jasnopis]], analyzer of text obscurity level * [[http://zil.ipipan.waw.pl/Scwad/AIDe|AIDe]], corpus of image descriptions in Polish (A. Wróblewska)