Size: 3328
Comment:
|
Size: 7257
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
This page contains a list of ''publicly available'' language tools and resources. |
|
Line 8: | Line 10: |
* [[http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm|Polish language of the XX century sixties]], * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora. |
* [[Polish language of the XX century sixties]], * [[http://www.ijp-pan.krakow.pl/index2.php?strona=korpus_tekst_star|Old Polish corpus]], * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE)), * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer. |
Line 14: | Line 18: |
* [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), | * [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), |
Line 17: | Line 24: |
* [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]], | * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]]. |
Line 20: | Line 27: |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski), * [[http://sgalus.republika.pl/indexe.html]] – lexical analyser and a Polish proof-reader (S. Galus), |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran), * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]), * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski), * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus), |
Line 25: | Line 37: |
* [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|Stemming engine for Polish]] (D. Weiss), | |
Line 29: | Line 40: |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish, |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish, * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]]. |
Line 33: | Line 45: |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser, * [[http://nlp.ipipan.waw.pl/Spejd/|Spejd]] – a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development), |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser, * [[Spejd]], a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development), |
Line 37: | Line 49: |
== Machine-readable dictionaries == * [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet]] (M. Piasecki), * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons, * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń), /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko, D. Skurzok) */ == Human-readable dictionaries == * [[http://www.wsjp.pl/|Wielki Słownik Języka Polskiego]], * [[http://pl.wiktionary.org|Wikisłownik]], * [[http://www.slownik-online.pl/index.php|Słownik wyrazów obcych i zwrotów obcojęzycznych Władysława Kopalińskiego]], * [[http://leksykony.interia.pl/synonim|Słownik synonimów i antonimów Piotra Żmigrodzkiego]]. == Speech analysis and synthesis tools == * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky) * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo) /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
|
Line 40: | Line 72: |
* [[http://www.microsofttranslator.com/|Bing Translator]] (multilingual), * [[http://translate.google.com/|Google Translate]] (multilingual), |
|
Line 42: | Line 76: |
* [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more). | * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more), * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV) * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language). |
Line 45: | Line 81: |
* [[http://plwordnet.pwr.wroc.pl/wordnet|plWordNet, Polish WordNet]] (M. Piasecki), * [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński) * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (forthcoming). |
* [[http://www.mimuw.edu.pl/polszczyzna/kolokacje/index.htm|Kolokacje]], a Web crawler and collocation finder (A. Buczyński), * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik), * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]]), * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak). |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus (the Polish sub-corpus of the International Corpus of Learner English (ICLE)),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
Morphological tools and resources
Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
TaKIPI, a morphosyntactic tagger for Polish,
PANTERA, a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.
Parsers, grammars, treebanks
Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń),
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona, commercial text-to-speech system (Expressivo)
Machine translation demonstrations
Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).