Locked History Actions

Diff for "LRT"

Differences between revisions 224 and 225
Revision 224 as of 2013-04-08 15:13:45
Size: 20625
Comment:
Revision 225 as of 2013-05-13 15:23:36
Size: 20678
Comment: update SRX link
Deletions are marked like this. Additions are marked like this.
Line 177: Line 177:
 * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/segment.srx|LanguageTool project]], see [[http://zil.ipipan.waw.pl/Segment|here]] for short instructions on how to use the tool),  * [[http://segment.sourceforge.net/|Segment]], a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in [[http://sourceforge.net/p/languagetool/code/HEAD/tree/trunk/languagetool/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx?format=raw|LanguageTool project]], see [[http://zil.ipipan.waw.pl/Segment|here]] for short instructions on how to use the tool),

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Parallel corpora

Spoken corpora

Translation memories

  • MyMemory, freely available multilingual TM,

  • TAUS Data, a multilingual TM from the members of TAUS Data Association.

Morphological tools and resources

Taggers

  • TaKIPI, a morphosyntactic tagger for Polish (Decision Trees),

  • PANTERA, a morphosyntactic tagger for Polish (Transformation-Based Learning),

  • WMBT, a morphosyntactic tagger for Polish (Memory-Based Learning),

  • TaCo, a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish),

  • WCRFT, a morphosyntactic tagger for Polish (Conditional Random Fields),

  • Concraft, a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields),

  • NKJP model for TnT Tagger, a trained model usable on Morfeusz-segmented text with TnT Tagger.

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski)

  • Translatica SRX sentence segmentation rules for Polish (LGPL)

  • Lakon, a system for news summarization (master's thesis by A. Dudczak),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3.

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki).

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish.

  • Fextor, a feature extraction framework.

  • LexCSD, a system for semi-automatic sense disambiguation.

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition

  • WordnetLoom, an wordnet editor application.

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units.

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NCP).

  • Polish Coreference Tools, a suite of Polish coreference resolution tools, created as part of the CORE project.