Locked History Actions

Diff for "LRT"

Differences between revisions 44 and 55 (spanning 11 versions)
Revision 44 as of 2011-04-18 13:41:58
Size: 7315
Comment:
Revision 55 as of 2011-04-20 18:56:38
Size: 8555
Comment:
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
 * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]] (the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE)),
 * [[http://poliqarp.sourceforge.net/|Poliqarp]] – a corpus indexing and search engine,
 * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]] – a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]] - a simple, light-weight Polish concordancer.
 * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE),
 * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine,
 * [[http://nlp.ipipan.waw.pl/Anotatornia/|Anotatornia]], a system for multi-level manual annotation of corpora,
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer.
Line 18: Line 18:
 * [[http://parasol.unibe.ch|ParaSol]] – a parallel corpus of Slavic and other languages,
 * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]] – a Polish-Ukrainian parallel corpus,
 * [[http://opus.lingfil.uu.se/index.php|OPUS]] – an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
 * [[http://nl.ijs.si/ME/V4|"1984"]] - an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
 * [[http://parasol.unibe.ch|ParaSol]], a parallel corpus of Slavic and other languages,
 * [[http://www.domeczek.pl/~polukr/index.php?option=search|PolUKR]], a Polish-Ukrainian parallel corpus,
 * [[http://opus.lingfil.uu.se/index.php|OPUS]], an open source parallel corpus (European Parliament, EMEA, KDE, movie subtitles),
 * [[http://nl.ijs.si/ME/V4|"1984"]], an annotated parallel corpus of George Orwell's "1984" in 15 languages, MULTEXT-East, v.4 (licensed download),
 * [[http://www.korpus.cz/intercorp/?req=page:info|InterCorp]], a multilingual parallel corpus,
Line 26: Line 27:
== Translation memories ==

 * [[http://mymemory.translated.net/|MyMemory]], freely available multilingual TM,
 * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association.
Line 27: Line 33:
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]] – morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://morfologik.blogspot.com/|Morfologik]] – morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]] – morphological analyser (K. Szafran),
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni),
 * [[http://
morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran),
Line 31: Line 38:
 * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]] - morphosyntactic specifications and documentation for 16 languages,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]] - a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
 * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
Line 36: Line 43:
 * [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
Line 40: Line 48:
 * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] – a morphosyntactic tagger for Polish,
 * [[http://code.google.com/p/pantera-tagger/|PANTERA]] – a morphosyntactic tagger for Polish,
 * a prototype [[http://nlp
.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/CD.tgz|implementation]] of Maximum Entropy tagging created within Radomir Mastalerz's [[http://nlp.ipipan.waw.pl/~adamp/msc/mastalerz.radomir/1000-MGR-INF-97543.pdf.gz|MSc]].
 * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish,
 * [[http://code.google.com/p/pantera-tagger/|PANTERA]], a morphosyntactic tagger for Polish.
Line 45: Line 52:
 * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]] – a DCG parser,
 * [[Spejd]] – a shallow parsing and disambiguation system,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]] – a treebank development system (under development),
 * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]].
 * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser,
 * [[Spejd]], a shallow parsing and disambiguation system,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development),
 * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]],
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński),
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/Szpakowicz/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz).
Line 52: Line 61:
 * [[http://synonimy.ux.pl/|Polish OpenThesaurus]] – a crowdsourced Polish thesaurus (M. Miłkowski),
 * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]] – Polish ispell dictionaries, along with some definitions and online form display.
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]] - the MULTEXT-East morphosyntactic lexicons,
 * [[http://synonimy.ux.pl/|Polish OpenThesaurus]], a crowdsourced Polish thesaurus (M. Miłkowski),
 * [[http://www.sjp.pl/|Słownik języka polskiego (d. alternatywny)]], Polish ispell dictionaries, along with some definitions and online form display.
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons,
Line 65: Line 74:
 * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]] - commercial speech recognition system (L. Pawlaczyk, P. Bosky)
 * [[http://www.ivona.com/|Ivona]] - commercial text-to-speech system (Expressivo)
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]] – an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]] – a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
 * [[http://skrybot.pl/en/products/skrybot-home-speech-recognition/|Skrybot]], commercial speech recognition system (L. Pawlaczyk, P. Bosky),
 * [[http://www.ivona.com/|Ivona]], commercial text-to-speech system (Expressivo),
 * [[http://www.acapela-group.com/text-to-speech-interactive-demo.html|Acapela]], text to speech demo,
 * [[http://www.syntezamowy.pjwstk.edu.pl/index.html|Synteza mowy polskiej]], automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),
 * [[http://www.staff.amu.edu.pl/~fonetyka/synteza/index.htm|System syntezy mowy ciągłej]] (G. Demenko, S. Grocholewski),
 * [[http://www.tcts.fpms.ac.be/synthesis/mbrola/|Polish MBROLA database]] (K. Szlanny, K. Marasek),
 * [[http://www.neurosoft.pl/?page_name=Produkty_SynTalk|SynTalk]], commercial speech synthesis system (NeuroSoft),
 * [[http://www.primespeech.pl/|PrimeSpeech]], commercial speech recognition systems.

/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:4154a450.pdf|ASR]], an automatic speech recognition system for Polish (M. Ziółko, J.Gałka, B. Ziółko, T. Jadczyk, D. Skurzok). */
/* * [[http://home.agh.edu.pl/~bziolko/dokuwiki/lib/exe/fetch.php?media=art:anotator.pdf|Anotator]], a fast speech corpora anotator dedicated for Polish and focused on connecting existing resources (B. Ziółko, B. Miga). */
Line 82: Line 97:
 * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]] – a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),  * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Parallel corpora

Translation memories

  • MyMemory, freely available multilingual TM,

  • TAUS Data, a multilingual TM from the members of TAUS Data Association.

Morphological tools and resources

Taggers

  • TaKIPI, a morphosyntactic tagger for Polish,

  • PANTERA, a morphosyntactic tagger for Polish.

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),

  • Lakon, a system for news summarization (master's thesis by A. Dudczak).