Size: 9185
Comment:
|
Size: 9185
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 88: | Line 88: |
* [[http://www.translatica.pl/|Translatica]] (EN-PL-EN, DE-PL-DE, RU-PL-RU), see also [[http://poleng.pl|Poleng website]] with an experimental FR-PL-FR version | * [[http://www.translatica.pl/|Translatica]] (EN-PL-EN, DE-PL-DE, RU-PL-RU), see also [[http://poleng.pl|Poleng website]] with an experimental FR-PL-FR version, |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus, the Polish sub-corpus of the International Corpus of Learner English (ICLE),
Poliqarp, a corpus indexing and search engine,
Anotatornia, a system for multi-level manual annotation of corpora,
Smyrna, a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol, a parallel corpus of Slavic and other languages,
PolUKR, a Polish-Ukrainian parallel corpus,
OPUS, an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
"1984", an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
InterCorp, a multilingual parallel corpus,
PSI collection of parallel corpora, a growing collection of parallel corpora pairing Polish with other european languages.
Translation memories
MyMemory, freely available multilingual TM,
TAUS Data, a multilingual TM from the members of TAUS Data Association.
Morphological tools and resources
Morfeusz SGJP, morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Index a tergo of Polish word forms (J. Tokarski, Z. Saloni),
Morfologik, morphological analyser (M. Miłkowski, D. Weiss),
SAM, morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MULTEXT-East, v.4, morphosyntactic specifications and documentation for 16 languages,
KIPI->MTE, a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Baza fleksyjna języka polskiego, inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
Parsers, grammars, treebanks
Świgra, a DCG parser,
Spejd, a shallow parsing and disambiguation system,
Dendrarium, a treebank development system (under development),
Analizator syntaktyczny AS (M. Woliński),
Formalny opis składniowy zdań polskich (S. Szpakowicz),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus, a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny), Polish ispell dictionaries, along with some definitions and online form display.
Sample morphosyntactic Polish lexicon, the MULTEXT-East morphosyntactic lexicons,
Słownik składniowy języka polskiego (Z. Greń).
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky),
Ivona, commercial text-to-speech system (Expressivo),
Acapela, text to speech demo,
Synteza mowy polskiej, automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),
System syntezy mowy ciągłej (G. Demenko, S. Grocholewski),
Polish MBROLA database (K. Szklanny, K. Marasek),
PrimeSpeech, commercial speech recognition systems.
Machine translation demonstrations
Translatica (EN-PL-EN, DE-PL-DE, RU-PL-RU), see also Poleng website with an experimental FR-PL-FR version,
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV),
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Translatica SRX sentence segmentation rules for Polish (LGPL)
Lakon, a system for news summarization (master's thesis by A. Dudczak),
SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models.