|
Size: 7315
Comment:
|
Size: 7253
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 12: | Line 12: |
| * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE)), * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer. |
* [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer. |
| Line 18: | Line 18: |
| * [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]] – a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]] - an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), |
* [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), |
| Line 27: | Line 27: |
| * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran), |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran), |
| Line 31: | Line 31: |
| * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]] - morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]] - a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), |
* [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), |
| Line 40: | Line 40: |
| * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish, |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish, |
| Line 45: | Line 45: |
| * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser, * [[Spejd]] – a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development), |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser, * [[Spejd]], a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development), |
| Line 52: | Line 52: |
| * [[http://synonimy.ux.pl/|Polish OpenThesaurus]] – a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]] – Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]] - the MULTEXT-East morphosyntactic lexicons, |
* [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons, |
| Line 65: | Line 65: |
| * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky) * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo) /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
* [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky) * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo) /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
| Line 82: | Line 82: |
| * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), | * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus, the Polish sub-corpus of the International Corpus of Learner English (ICLE),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
Morphological tools and resources
Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
TaKIPI, a morphosyntactic tagger for Polish,
PANTERA, a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.
Parsers, grammars, treebanks
Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń),
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona, commercial text-to-speech system (Expressivo)
Machine translation demonstrations
Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).
