Differences between revisions 44 and 47 (spanning 3 versions)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora and corpus-related tools

National Corpus of Polish (under development),
IPI PAN Corpus,
PWN Corpus,
PELCRA Corpus,
Polish language of the XX century sixties,
Old Polish corpus,
PICLE corpus, the Polish sub-corpus of the International Corpus of Learner English (ICLE),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.

Parallel corpora

ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
Leeds collection of Internet corpora,
LAGUN corpus,
JRC-Acquis Multilingual Parallel Corpus.

Morphological tools and resources

Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).

Taggers

TaKIPI, a morphosyntactic tagger for Polish,
PANTERA, a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.

Parsers, grammars, treebanks

Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
A Treebank / Test Suite for Polish.

Machine-readable dictionaries

plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń),

Human-readable dictionaries

Speech analysis and synthesis tools

Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona, commercial text-to-speech system (Expressivo)

Machine translation demonstrations

Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).

Other

Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).

-  ⇤ ← Revision 44 as of 2011-04-18 13:41:58 → 
  Size: 7315
  Editor: NataliaKotsyba
  Comment:
+   ← Revision 47 as of 2011-04-18 14:58:51 → ⇥
  Size: 7253
  Editor: AdamPrzepiorkowski
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 12:
- * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE)),
 * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine,
 * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer.
+ * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE),
 * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine,
 * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer.
 Line 18:
- * [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages,
 * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]] – a Polish-Ukrainian parallel corpus, 
 * [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
 * [[http://nl.ijs.si/ME/V4|"1984"]] - an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
+ * [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages,
 * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus, 
 * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
 * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
 Line 27:
- * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran),
+ * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran),
 Line 31:
- * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]] - morphosyntactic specifications and documentation for 16 languages,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]] - a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
+ * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
 Line 40:
- * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish,
 * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish,
+ * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish,
 * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish,
 Line 45:
- * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser,
 * [[Spejd]] – a shallow parsing and disambiguation system,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development),
+ * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser,
 * [[Spejd]], a shallow parsing and disambiguation system,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development),
 Line 52:
- * [[http://synonimy.ux.pl/|Polish OpenThesaurus]] – a crowdsourced Polish thesaurus (M. Miłkowski),
 * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]] – Polish ispell dictionaries, along with some definitions and online form display.
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]] - the MULTEXT-East morphosyntactic lexicons,
+ * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski),
 * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display.
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons,
 Line 65:
- * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
 * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo)
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish  (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
+ * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky)
 * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo)
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish  (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
 Line 82:
- * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),
+ * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),

Diff for "LRT"

Menu

Wiki

Language Tools and Resources for Polish

Written corpora and corpus-related tools

Parallel corpora

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other