Locked History Actions

Diff for "LRT"

Differences between revisions 225 and 226
Revision 225 as of 2013-05-13 15:23:36
Size: 20678
Comment: update SRX link
Revision 226 as of 2013-06-05 21:23:59
Size: 20679
Comment: tylko przecinek
Deletions are marked like this. Additions are marked like this.
Line 30: Line 30:
 * [[http://nlp.pwr.wroc.pl/kpwr|KPWr]] Polish Corpus of Wrocław University of Technology, collection of documents available on Creative Common license annotated with syntactic chunks, proper names, semantic relations, anaphora and word senses,  * [[http://nlp.pwr.wroc.pl/kpwr|KPWr]], Polish Corpus of Wrocław University of Technology, collection of documents available on Creative Common license annotated with syntactic chunks, proper names, semantic relations, anaphora and word senses,

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Parallel corpora

Spoken corpora

Translation memories

  • MyMemory, freely available multilingual TM,

  • TAUS Data, a multilingual TM from the members of TAUS Data Association.

Morphological tools and resources

Taggers

  • TaKIPI, a morphosyntactic tagger for Polish (Decision Trees),

  • PANTERA, a morphosyntactic tagger for Polish (Transformation-Based Learning),

  • WMBT, a morphosyntactic tagger for Polish (Memory-Based Learning),

  • TaCo, a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish),

  • WCRFT, a morphosyntactic tagger for Polish (Conditional Random Fields),

  • Concraft, a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields),

  • NKJP model for TnT Tagger, a trained model usable on Morfeusz-segmented text with TnT Tagger.

Parsers, grammars, treebanks

Machine-readable dictionaries

Human-readable dictionaries

Speech analysis and synthesis tools

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski)

  • Translatica SRX sentence segmentation rules for Polish (LGPL)

  • Lakon, a system for news summarization (master's thesis by A. Dudczak),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3.

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki).

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish.

  • Fextor, a feature extraction framework.

  • LexCSD, a system for semi-automatic sense disambiguation.

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition

  • WordnetLoom, an wordnet editor application.

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units.

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NCP).

  • Polish Coreference Tools, a suite of Polish coreference resolution tools, created as part of the CORE project.