|
Size: 7315
Comment:
|
Size: 7584
Comment: primespeech
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 12: | Line 12: |
| * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE)), * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer. |
* [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE), * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine, * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer. |
| Line 18: | Line 18: |
| * [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]] – a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]] - an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), |
* [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages, * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles), * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download), |
| Line 26: | Line 26: |
| == Translation memories == * [[http://mymemory.translated.net/|MyMemory]], freely available multilingual TM, * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association. |
|
| Line 27: | Line 32: |
| * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran), |
* [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz), * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni), * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran), |
| Line 31: | Line 37: |
| * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]] - morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]] - a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), |
* [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages, * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba), |
| Line 40: | Line 46: |
| * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish, * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]]. |
* [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish, * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish. |
| Line 45: | Line 50: |
| * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser, * [[Spejd]] – a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development), |
* [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser, * [[Spejd]], a shallow parsing and disambiguation system, * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development), |
| Line 52: | Line 57: |
| * [[http://synonimy.ux.pl/|Polish OpenThesaurus]] – a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]] – Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]] - the MULTEXT-East morphosyntactic lexicons, |
* [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski), * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display. * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons, |
| Line 65: | Line 70: |
| * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky) * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo) /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
* [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky), * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo) * [[http://www.syntezamowy.pjwstk.edu.pl/index.html|Synteza mowy polskiej]], automatic speech recognition and speech synthesis demos, with background information (K. Szklanny), * [[http://www.primespeech.pl/|PrimeSpeech]], commercial speech recognition systems. /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
| Line 82: | Line 89: |
| * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), | * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus, the Polish sub-corpus of the International Corpus of Learner English (ICLE),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
Translation memories
MyMemory, freely available multilingual TM,
TAUS Data, a multilingual TM from the members of TAUS Data Association.
Morphological tools and resources
Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Index a tergo of Polish word forms (J. Tokarski, Z. Saloni),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
Parsers, grammars, treebanks
Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń),
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky),
Ivona, commercial text-to-speech system (Expressivo)
Synteza mowy polskiej, automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),
PrimeSpeech, commercial speech recognition systems.
Machine translation demonstrations
Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).
