Locked History Actions

Diff for "LRT"

Differences between revisions 137 and 144 (spanning 7 versions)
Revision 137 as of 2012-02-26 11:05:16
Size: 13618
Comment:
Revision 144 as of 2012-03-23 15:12:19
Size: 13402
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
 * [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] (NKJP)
  * [[http://nkjp.pl/poliqarp/|Poliqarp search engine for NKJP data]], a search engine for the National Corpus of Polish,
  * [[http://nkjp.uni.lodz.pl/|PELCRA search engine for NKJP data]], a search engine for the National Corpus of Polish,
  * [[http://www.nkjp.uni.lodz.pl/collocations.jsp|Kolokator]], a collocation extraction tool for NKJP data,
  * [[http://nlp.ipipan.waw.pl/TEI4NKJP/|TEI4NKJP]], a collection of XML schemata used in NKJP,
 * [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] (NKJP),
Line 12: Line 8:
  * [[attachment:gramatyka_Spejd_NKJP_RC1.0.zip]], a release candidate of a shallow [[Spejd]] grammar for NKJP, available on GNU GPL v.3,
  * [[Nerf]], a tool for named entity recognition, available on GNU GPL v.3,
  * report errors in:
   * [[https://docs.google.com/spreadsheet/viewform?hl=pl&formkey=dERoLWhzYWNveXlvS09ZMDlRNmcydVE6MQ#gid=0|the 1-million word subcorpus]],
   * [[https://docs.google.com/spreadsheet/viewform?hl=pl&formkey=dDgtbVpRTGFYWEROcGVxSVd6VGdZMGc6MA#gid=0|the full NKJP]],
Line 22: Line 13:
 * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE),  * [[http://ifa.amu.edu.pl/~ifaconc/blog/?page_id=60|PICLE corpus]], the Polish sub-corpus of the [[http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/icle.htm|International Corpus of Learner English]] (ICLE),   * [[http://dl.psnc.pl/activities/projekty/impact/results/| IMPACT ground-truth data]] for selected Polish historical documents from PIONIER Digital Libraries Federation,
Line 25: Line 17:
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer,
 * [[http://dl
.psnc.pl/activities/projekty/impact/results/| IMPACT ground-truth data]] for selected Polish historical documents from PIONIER Digital Libraries Federation.
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer.
Line 37: Line 28:
 * [[http://psi.amu.edu.pl/en/index.php?title=Parallel_Corpora|PSI collection of parallel corpora]], a growing collection of parallel corpora pairing Polish with other european languages.  * [[http://psi.amu.edu.pl/en/index.php?title=Parallel_Corpora|PSI collection of parallel corpora]], a growing collection of parallel corpora pairing Polish with other european languages,
 * [[http://www.pol-ros.polon.uw.edu.pl/|Polish-Russian Parallel Corpus]]
Line 74: Line 66:
 * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|Świgra]], a DCG parser,
 * [[http://zil.ipipan.waw.pl/Spejd|Spejd]], a shallow parsing and disambiguation system,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system (under development),
 * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|Składnica]], a hybrid constituency/dependency treebank of Polish (under development),
 * Świgra, a DCG parser,
   * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|version 1.0]] (2005),
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|version 1.5]] as used in Składnica (2011),
 * Spejd, a shallow parsing and disambiguation system,
   * the [[http://zil.ipipan.waw.pl/Spejd|current version]] of the system,
   * Spejd [[attachment:gramatyka_Spejd_NKJP_1.0.zip|grammar of Polish]] (version 1.0), developed by K. Głowińska within [[http://nkjp.pl/|NKJP]], available on GNU GPL v.3,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system,
Line 132: Line 129:
 * [[http://nlp.ipipan.waw.pl/WSDDE/|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),  * [[http://zil.ipipan.waw.pl/WSDDE|WSDDE]], a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki ''et al.''),
Line 140: Line 137:
 * [[http://hipisek.pl|Hipisek]], an experimental question answering system (M. Walas).
 * [[https://bitbucket.org/jsbien/ndt|Narzędzia dygitalizacji tekstów]], Poliqarp for !DjVu i inne programy.
 * [[http://hipisek.pl|Hipisek]], an experimental question answering system (M. Walas),
 * [[https://bitbucket.org/jsbien/ndt|Narzędzia dygitalizacji tekstów]], Poliqarp for !DjVu i inne programy,
 * [[Nerf]], a tool for named entity recognition, available on GNU GPL v.3.
 * [[http://psi-toolkit.wmi.amu.edu.pl/index.html|PSI-Toolkit]], a chain of publicly available tools for automatic processing of Polish.

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Parallel corpora

Spoken corpora

Translation memories

  • MyMemory, freely available multilingual TM,

  • TAUS Data, a multilingual TM from the members of TAUS Data Association.

Morphological tools and resources

Taggers

  • TaKIPI, a morphosyntactic tagger for Polish,

  • PANTERA, a morphosyntactic tagger for Polish,

  • WMBT, a morphosyntactic tagger for Polish.

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

  • Skrybot, commercial speech recognition system (L. Pawlaczyk, P. Bosky),

  • Ivona, commercial text-to-speech system (Expressivo),

  • Acapela, text to speech demo,

  • Synteza mowy polskiej, automatic speech recognition and speech synthesis demos, with background information (K. Szklanny),

  • System syntezy mowy ciągłej (G. Demenko, S. Grocholewski),

  • Polish MBROLA database (K. Szklanny, K. Marasek),

  • SynTalk, commercial speech synthesis system (NeuroSoft),

  • PrimeSpeech, commercial speech recognition systems,

  • OrtFon, phonetic transcriber (AGH DSP),

  • ASR, automatic speech recognition system for Polish (AGH DSP),

  • Anotator, speech corpora anotator dedicated for Polish and focused on connecting existing resources (AGH DSP),

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski)

  • Translatica SRX sentence segmentation rules for Polish (LGPL)

  • Lakon, a system for news summarization (master's thesis by A. Dudczak),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3.

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish.