Size: 3372
Comment:
|
Size: 9185
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
This page contains a list of ''publicly available'' language tools and resources. |
|
Line 8: | Line 10: |
* [[http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm|Polish language of the XX century sixties]], * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora. |
* [[Polish language of the XX century sixties]], * [[http://www.ijp-pan.krakow.pl/index2.php?strona=korpus_tekst_star|Old Polish corpus]], * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer. |
Line 14: | Line 18: |
* [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), | * [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), * [[http://www.korpus.cz/intercorp/?req=page:info|InterCorp]], a multilingual parallel corpus, |
Line 18: | Line 26: |
* [[http://psi.amu.edu.pl/en/index.php?title=Parallel_Corpora|PSI collection of parallel corpora]], a growing collection of parallel corpora pairing Polish with other european languages. == Translation memories == * [[http://mymemory.translated.net/|MyMemory]], freely available multilingual TM, * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association. |
|
Line 20: | Line 34: |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski), * [[http://sgalus.republika.pl/indexe.html]] – lexical analyser and a Polish proof-reader (S. Galus), |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni), * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran), * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]), * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski), * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus), |
Line 24: | Line 44: |
* [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka), | |
Line 25: | Line 46: |
* [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|Stemming engine for Polish]] (D. Weiss), | |
Line 29: | Line 49: |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish, |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish. |
Line 33: | Line 53: |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser, * [[http://nlp.ipipan.waw.pl/Spejd/|Spejd]] – a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development), * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]]. |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser, * [[Spejd]], a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development), * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]], * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński), * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/Szpakowicz/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz), * [[http://thetos.aei.polsl.pl/las/|Serwer LAS / Linguistic Analysis Server]]. == Machine-readable dictionaries == * [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet]] (M. Piasecki), * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons, * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń). /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko, D. Skurzok) */ == Human-readable dictionaries == * [[http://www.wsjp.pl/|Wielki Słownik Języka Polskiego]], * [[http://pl.wiktionary.org|Wikisłownik]], * [[http://www.slownik-online.pl/index.php|Słownik wyrazów obcych i zwrotów obcojęzycznych Władysława Kopalińskiego]], * [[http://leksykony.interia.pl/synonim|Słownik synonimów i antonimów Piotra Żmigrodzkiego]]. == Speech analysis and synthesis tools == * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky), * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo), * [[http://www.acapela-group.com/text-to-speech-interactive-demo.html|Acapela]], text to speech demo, * [[http://www.syntezamowy.pjwstk.edu.pl/index.html|Synteza mowy polskiej]], automatic speech recognition and speech synthesis demos, with background information (K. Szklanny), * [[http://www.staff.amu.edu.pl/~fonetyka/synteza/index.htm|System syntezy mowy ciągłej]] (G. Demenko, S. Grocholewski), * [[http://www.tcts.fpms.ac.be/synthesis/mbrola/|Polish MBROLA database]] (K. Szklanny, K. Marasek), * [[http://www.neurosoft.pl/?page_name=Produkty_SynTalk|SynTalk]], commercial speech synthesis system (NeuroSoft), * [[http://www.primespeech.pl/|PrimeSpeech]], commercial speech recognition systems. /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
Line 39: | Line 88: |
* [[http://www.translatica.pl/|Translatica]] (EN-PL-EN), | * [[http://www.translatica.pl/|Translatica]] (EN-PL-EN, DE-PL-DE, RU-PL-RU), see also [[http://poleng.pl|Poleng website]] with an experimental FR-PL-FR version, * [[http://www.microsofttranslator.com/|Bing Translator]] (multilingual), * [[http://translate.google.com/|Google Translate]] (multilingual), |
Line 42: | Line 93: |
* [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more). | * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more), * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV), * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language). |
Line 45: | Line 99: |
* [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet]] (M. Piasecki), * [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński) * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (forthcoming), * [[http://nlp.ipipan.waw.pl/PPJP/|etc.]] |
* [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński), * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik), * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]]), * [[http://poleng.pl/translatica-pl.srx|Translatica SRX sentence segmentation rules for Polish (LGPL)]] * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak), * [[http://psi.amu.edu.pl/en/index.php?title=SyMGIZA%2B%2B|SyMGIZA++]], an extension of Giza++ that computes symmetric word alignment models. |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus, the Polish sub-corpus of the International Corpus of Learner English (ICLE),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
InterCorp, a multilingual parallel corpus,
PSI collection of parallel corpora, a growing collection of parallel corpora pairing Polish with other european languages.
Translation memories
MyMemory, freely available multilingual TM,
TAUS Data, a multilingual TM from the members of TAUS Data Association.
Morphological tools and resources
Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Index a tergo of Polish word forms (J. Tokarski, Z. Saloni),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Baza fleksyjna języka polskiego, inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
Parsers, grammars, treebanks
Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
Analizator syntaktyczny AS (M. Woliński),
Formalny opis składniowy zdań polskich (S. Szpakowicz),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń).
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky),
Ivona, commercial text-to-speech system (Expressivo),
Acapela, text to speech demo,
Synteza mowy polskiej, automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),
System syntezy mowy ciągłej (G. Demenko, S. Grocholewski),
Polish MBROLA database (K. Szklanny, K. Marasek),
PrimeSpeech, commercial speech recognition systems.
Machine translation demonstrations
Translatica (EN-PL-EN, DE-PL-DE, RU-PL-RU), see also Poleng website with an experimental FR-PL-FR version,
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV),
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Translatica SRX sentence segmentation rules for Polish (LGPL)
Lakon, a system for news summarization (master's thesis by A. Dudczak),
SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models.