Differences between revisions 26 and 38 (spanning 12 versions)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora and corpus-related tools

National Corpus of Polish (under development),
IPI PAN Corpus,
PWN Corpus,
PELCRA Corpus,
Polish language of the XX century sixties,
Old Polish corpus,
PICLE corpus (the Polish sub-corpus of the International Corpus of Learner English (ICLE)),
Poliqarp – a corpus indexing and search engine,
Anotatornia – a system for multi-level manual annotation of corpora,
Smyrna - a simple, light-weight Polish concordancer.

Parallel corpora

OPUS – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
Leeds collection of Internet corpora,
LAGUN corpus,
JRC-Acquis Multilingual Parallel Corpus.

Morphological tools and resources

Morfeusz SGJP – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik – morphological analyser (M. Miłkowski, D. Weiss),
SAM – morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).

Taggers

TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.

Parsers, grammars, treebanks

Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
Dendrarium – a treebank development system (under development),
A Treebank / Test Suite for Polish.

Machine-readable dictionaries

plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus – a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny) – Polish ispell dictionaries, along with some definitions and online form display.
Słownik składniowy języka polskiego (Z. Greń),

Human-readable dictionaries

Speech analysis and synthesis tools

Skrybot - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona - commercial text-to-speech system (Expressivo)

Machine translation demonstrations

Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).

Other

Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.).

-  ⇤ ← Revision 26 as of 2011-04-03 09:44:19 → 
  Size: 4764
  Editor: MariuszZiolko
  Comment:
+   ← Revision 38 as of 2011-04-13 20:02:59 → ⇥
  Size: 6032
  Editor: MarcinMilkowski
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
+This page contains a list of ''publicly available'' language tools and resources.
-Line 8:
+Line 10:
- * [[http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm|Polish language of the XX century sixties]],
+ * [[Polish language of the XX century sixties]],
-Line 12:
+Line 14:
- * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora.
+ * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer.
-Line 22:
+Line 25:
- * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski),
+ * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss),
-Line 25:
+Line 28:
+ * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
-Line 28:
+Line 32:
- * [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|Stemming engine for Polish]] (D. Weiss),
-Line 34:
+Line 37:
- * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|Maximum Entropy tagger]] created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]].
+ * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]].
-Line 46:
+Line 49:
- * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko),
+ * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń),
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko, D. Skurzok) */
-Line 50:
+Line 54:
+ * [[http://pl.wiktionary.org|Wikisłownik]],
-Line 52:
+Line 57:
+== Speech analysis and synthesis tools ==
 * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
 * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo)
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish  (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
-Line 59:
+Line 70:
- * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more).
+ * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more),
 * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
 * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language).
-Line 64:
+Line 77:

Diff for "LRT"

Menu

Wiki

Language Tools and Resources for Polish

Written corpora and corpus-related tools

Parallel corpora

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other