Locked History Actions

Diff for "LRT"

Differences between revisions 78 and 115 (spanning 37 versions)
Revision 78 as of 2011-05-19 16:20:12
Size: 10233
Comment:
Revision 115 as of 2011-11-21 14:17:21
Size: 12608
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
 * [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] (under development),  * [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] (NKJP)
  * [[http://nkjp.pl/poliqarp/|Poliqarp search engine for NKJP data]], a search engine for the National Corpus of Polish,
  * [[http://nkjp.uni.lodz.pl/|PELCRA search engine for NKJP data]], a search engine for the National Corpus of Polish,
  * [[http://www.nkjp.uni.lodz.pl/collocations.jsp|Kolokator]], a collocation extraction tool for NKJP data,
  * [[http://nlp.ipipan.waw.pl/TEI4NKJP/|TEI4NKJP]], a collection of XML schemata used in NKJP,
  * [[attachment:NKJP-PodkorpusMilionowy-1.0.tgz]], the manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3,
  * [[attachment:gramatyka_Spejd_NKJP_RC1.0.zip]], a release candidate of a shallow [[Spejd]] grammar for NKJP, available on GNU GPL v.3,
  * [[Nerf]], a tool for named entity recognition, available on GNU GPL v.3,
Line 11: Line 18:
 * [[Polish language of the XX century sixties]],  * [[PL196x|Polish language of the 1960s]],
Line 28: Line 35:

== Spoken corpora ==

 * [[http://clip.ipipan.waw.pl/LUNA|The annotated corpus of spoken dialogues]] (LUNA project, corpus data available at the end of the page)
Line 47: Line 58:
 * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki).  * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki/|WCCL]], toolkit for morphosyntactic feature generation (A. Radziszewski, A. Wardyński, T. Śniatowski, P. Kędzia).
Line 51: Line 63:
 * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish.  * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish,
 * [[http://nlp
.pwr.wroc.pl/redmine/projects/wmbt/wiki|WMBT]], a morphosyntactic tagger for Polish.
Line 58: Line 71:
 * [[http://amos.klf.uw.edu.pl/| Visualisation of parsing tree forests]] (Świdziński's grammar,Świgra, Morfeusz, Bień's syntactic spreadsheets) by Andrzej Zaborowski,  * [[http://amos.klf.uw.edu.pl/| Visualisation of parsing tree forests]] (Świdziński's grammar, Świgra, Morfeusz, Bień's syntactic spreadsheets) by Andrzej Zaborowski,
Line 68: Line 81:
 * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń).
 * [[http://home.agh.edu.pl/~ziolko/doku.php?id=pl:resources:ngram|N-gram model of Polish]] (AGH DSP)
 * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słownik składniowy języka polskiego]] (Z. Greń),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:ngram|N-gram model of Polish]] (AGH DSP),
 * [[Nowy_slownik_angielsko-polski|Nowy słownik angielsko-polski]] (T. Piotrowski, Z. Saloni),
 * [[https://github.com/apohllo/polish-cyc|Polish OpenCYC]] (A. Pohl),
 * [[http://www.slowniki.org.pl/pol.html|Polish machine-generated dictionaries]], available on Creative Commons (J. Kazojć),
 * [[http://futrega.org/etc/nazwiska.zip|List of all Polish surnames]], licence unknown, see [[http://futrega.org/etc/nazwiska.html|further information on this resource]],
 * [[http://clip.ipipan.waw.pl/Gazetteer|Gazetteer for Polish Named Entities]] (A. Savary, J. Piskorski).
Line 90: Line 107:
 * [[http://home.agh.edu.pl/~ziolko/doku.php?id=pl:resources:ortfon|OrtFon]], phonetic transcriber (AGH DSP).
 * [[http://home.agh.edu.pl/~ziolko/doku.php?id=pl:resources:asr|ASR]], automatic speech recognition system for Polish (AGH DSP).
 * [[http://home.agh.edu.pl/~ziolko/doku.php?id=pl:resources:anotator|Anotator]], speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP).
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:ortfon|OrtFon]], phonetic transcriber (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:asr|ASR]], automatic speech recognition system for Polish (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:anotator|Anotator]], speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=pl:resources:korpusmowy|Speech corpus]], Around 9 hours, word-annotated Polish speech corpus (AGH DSP),
 * [[http://www.dsp.agh.edu.pl/doku.php?id=en:resources:korpusav|AV corpus]], Audiovideo corpus of Polish speech (AGH DSP),
Line 95: Line 114:
 * [[http://itranslate4.eu/|iTranslate4.eu]] (multiple languages, allows comparing translation engines),
Line 112: Line 132:
 * [[http://psi.amu.edu.pl/en/index.php?title=SyMGIZA%2B%2B|SyMGIZA++]], an extension of Giza++ that computes symmetric word alignment models.  * [[http://psi.amu.edu.pl/en/index.php?title=SyMGIZA%2B%2B|SyMGIZA++]], an extension of Giza++ that computes symmetric word alignment models,
 * [[http://chopin.ipipan.waw.pl/multiservice/|Multiservice]], a sample interface for running NLP Web services for Polish,
 * [[http://hipisek.pl|Hipisek]], an experimental question answering system (M. Walas).

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Parallel corpora

Spoken corpora

Translation memories

  • MyMemory, freely available multilingual TM,

  • TAUS Data, a multilingual TM from the members of TAUS Data Association.

Morphological tools and resources

Taggers

  • TaKIPI, a morphosyntactic tagger for Polish,

  • PANTERA, a morphosyntactic tagger for Polish,

  • WMBT, a morphosyntactic tagger for Polish.

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

  • Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky),

  • Ivona, commercial text-to-speech system (Expressivo),

  • Acapela, text to speech demo,

  • Synteza mowy polskiej, automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),

  • System syntezy mowy ciągłej (G. Demenko, S. Grocholewski),

  • Polish MBROLA database (K. Szklanny, K. Marasek),

  • SynTalk, commercial speech synthesis system (NeuroSoft),

  • PrimeSpeech, commercial speech recognition systems,

  • OrtFon, phonetic transcriber (AGH DSP),

  • ASR, automatic speech recognition system for Polish (AGH DSP),

  • Anotator, speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP),

  • Speech corpus, Around 9 hours, word-annotated Polish speech corpus (AGH DSP),

  • AV corpus, Audiovideo corpus of Polish speech (AGH DSP),

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),

  • Translatica SRX sentence segmentation rules for Polish (LGPL)

  • Lakon, a system for news summarization (master's thesis by A. Dudczak),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish,

  • Hipisek, an experimental question answering system (M. Walas).