Differences between revisions 38 and 39

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora and corpus-related tools

National Corpus of Polish (under development),
IPI PAN Corpus,
PWN Corpus,
PELCRA Corpus,
Polish language of the XX century sixties,
Old Polish corpus,
PICLE corpus (the Polish sub-corpus of the International Corpus of Learner English (ICLE)),
Poliqarp – a corpus indexing and search engine,
Anotatornia – a system for multi-level manual annotation of corpora,
Smyrna - a simple, light-weight Polish concordancer.

OPUS – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
Leeds collection of Internet corpora,
LAGUN corpus,
JRC-Acquis Multilingual Parallel Corpus.

Morfeusz SGJP – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik – morphological analyser (M. Miłkowski, D. Weiss),
SAM – morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).

TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.

plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus – a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny) – Polish ispell dictionaries, along with some definitions and online form display.
Słownik składniowy języka polskiego (Z. Greń),

Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).

-  ⇤ ← Revision 38 as of 2011-04-13 20:02:59 → 
  Size: 6032
  Editor: MarcinMilkowski
  Comment:
+   ← Revision 39 as of 2011-04-13 21:46:39 → ⇥
  Size: 6446
  Editor: MarcinMilkowski
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 76:
- * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.'').
+ * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),
 * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]]),
 * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak).