Differences between revisions 20 and 41 (spanning 21 versions)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora and corpus-related tools

National Corpus of Polish (under development),
IPI PAN Corpus,
PWN Corpus,
PELCRA Corpus,
Polish language of the XX century sixties,
Old Polish corpus,
PICLE corpus (the Polish sub-corpus of the International Corpus of Learner English (ICLE)),
Poliqarp – a corpus indexing and search engine,
Anotatornia – a system for multi-level manual annotation of corpora,
Smyrna - a simple, light-weight Polish concordancer.

Parallel corpora

ParaSol – a parallel corpus of Slavic and other languages,
OPUS – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
Leeds collection of Internet corpora,
LAGUN corpus,
JRC-Acquis Multilingual Parallel Corpus.

Morphological tools and resources

Morfeusz SGJP – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik – morphological analyser (M. Miłkowski, D. Weiss),
SAM – morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).

Taggers

TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.

Parsers, grammars, treebanks

Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
Dendrarium – a treebank development system (under development),
A Treebank / Test Suite for Polish.

Machine-readable dictionaries

plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus – a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny) – Polish ispell dictionaries, along with some definitions and online form display.
Słownik składniowy języka polskiego (Z. Greń),

Human-readable dictionaries

Speech analysis and synthesis tools

Skrybot - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona - commercial text-to-speech system (Expressivo)

Machine translation demonstrations

Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).

Other

Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).

-  ⇤ ← Revision 20 as of 2011-03-18 16:35:59 → 
  Size: 4298
  Editor: AdamPrzepiorkowski
  Comment:
+   ← Revision 41 as of 2011-04-18 11:29:05 → ⇥
  Size: 6634
  Editor: AdamPrzepiorkowski
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
+This page contains a list of ''publicly available'' language tools and resources.
-Line 8:
+Line 10:
- * [[http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm|Polish language of the XX century sixties]],
+ * [[Polish language of the XX century sixties]],
-Line 12:
+Line 14:
- * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora.
+ * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer.
-Line 15:
+Line 18:
+ * [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages,
-Line 18:
+Line 22:
- * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]],
+ * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]].
-Line 22:
+Line 26:
- * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski),
+ * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran),
-Line 24:
+Line 29:
+ * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
-Line 27:
+Line 33:
- * [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|Stemming engine for Polish]] (D. Weiss),
-Line 32:
+Line 37:
- * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish.
+ * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish,
 * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]].
-Line 44:
+Line 50:
+ * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń),
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko, D. Skurzok) */
-Line 47:
+Line 55:
+ * [[http://pl.wiktionary.org|Wikisłownik]],
-Line 49:
+Line 58:
+== Speech analysis and synthesis tools ==
 * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
 * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo)
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish  (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
-Line 56:
+Line 71:
- * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more).
+ * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more),
 * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
 * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language).
-Line 60:
+Line 77:
- * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.'').
+ * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),
 * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik),
 * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]]),
 * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak).

Diff for "LRT"

Menu

Wiki

Language Tools and Resources for Polish

Written corpora and corpus-related tools

Parallel corpora

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other