Size: 4298
Comment:
|
Size: 6634
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
This page contains a list of ''publicly available'' language tools and resources. |
|
Line 8: | Line 10: |
* [[http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm|Polish language of the XX century sixties]], | * [[Polish language of the XX century sixties]], |
Line 12: | Line 14: |
* [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora. | * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora, * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer. |
Line 15: | Line 18: |
* [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages, | |
Line 18: | Line 22: |
* [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]], | * [[http://langtech.jrc.it/JRC-Acquis.html|JRC-Acquis Multilingual Parallel Corpus]]. |
Line 22: | Line 26: |
* [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski), | * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss), * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran), |
Line 24: | Line 29: |
* [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski), | |
Line 27: | Line 33: |
* [[http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=en|Stemming engine for Polish]] (D. Weiss), | |
Line 32: | Line 37: |
* [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish. | * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish, * a prototype [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]]. |
Line 44: | Line 50: |
* [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń), /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:ziolkofull.pdf|N-gram model of Polish]] (B. Ziółko, D. Skurzok) */ |
|
Line 47: | Line 55: |
* [[http://pl.wiktionary.org|Wikisłownik]], | |
Line 49: | Line 58: |
== Speech analysis and synthesis tools == * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky) * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo) /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */ /* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */ |
|
Line 56: | Line 71: |
* [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more). | * [[http://www.systran.co.uk/|Systran]] (EN-PL, PL-FR and some more), * [[http://www.xdobry.de/esperantoedit/index_pl.html|Esperantilo]] (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV) * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language). |
Line 60: | Line 77: |
* [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''). | * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''), * [[http://frazeo.pl/|Frazeo]], a search engine and clusterer of news in Polish (P. Pęzik), * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]]), * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak). |
Language Tools and Resources for Polish
This page contains a list of publicly available language tools and resources.
Written corpora and corpus-related tools
National Corpus of Polish (under development),
PICLE corpus (the Polish sub-corpus of the International Corpus of Learner English (ICLE)),
Poliqarp – a corpus indexing and search engine,
Anotatornia – a system for multi-level manual annotation of corpora,
Smyrna - a simple, light-weight Polish concordancer.
Parallel corpora
ParaSol – a parallel corpus of Slavic and other languages,
OPUS – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
Morphological tools and resources
Morfeusz SGJP – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
Morfologik – morphological analyser (M. Miłkowski, D. Weiss),
SAM – morphological analyser (K. Szafran),
UAM Text Tools (P. Obrębski, Z. Vetulani; see also http://utt.wmi.amu.edu.pl/trac/wiki/),
MACA, Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
Lexical analyser and a Polish proof-reader (S. Galus),
Neurosoft Gram (demo of a morphological analyser),
Finite state utilities (J. Daciuk),
Stempel, another stemmer (A. Białecki).
Taggers
TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
a prototype implementation of Maximum Entropy tagging created within Radomir Mastalerz's MSc.
Parsers, grammars, treebanks
Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
Dendrarium – a treebank development system (under development),
Machine-readable dictionaries
plWordNet, Polish WordNet (M. Piasecki),
Polish OpenThesaurus – a crowdsourced Polish thesaurus (M. Miłkowski),
Słownik języka polskiego (d. alternatywny) – Polish ispell dictionaries, along with some definitions and online form display.
Słownik składniowy języka polskiego (Z. Greń),
Human-readable dictionaries
Speech analysis and synthesis tools
Skrybot - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
Ivona - commercial text-to-speech system (Expressivo)
Machine translation demonstrations
Translatica (EN-PL-EN),
Bing Translator (multilingual),
Google Translate (multilingual),
InterTran (multilingual),
LingvoBit (EN-PL-EN),
Systran (EN-PL, PL-FR and some more),
Esperantilo (integrated Esperanto editor, with MT for EO-PL-DE-EN-SV)
Thetos (PL-Sign language).
Other
Kolokacje, a Web crawler and collocation finder (A. Buczyński),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),
Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),
Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),
Lakon, a system for news summarization (master's thesis by A. Dudczak).