Locked History Actions

Diff for "LRT"

Differences between revisions 228 and 240 (spanning 12 versions)
Revision 228 as of 2013-06-08 20:01:15
Size: 20672
Comment:
Revision 240 as of 2014-02-06 16:49:18
Size: 20254
Editor: PiotrPezik
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
 * [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] (NKJP),

   * [[attachment:NKJP-PodkorpusMilionowy-1.1.tgz]], the manually annotated 1-million word subcorpus of the NJKP, available on GNU GPL v.3,
   * [[http://zil.ipipan.waw.pl/DistrNKJP|Distributable version of NKJP]],
   * [[http://zil.ipipan.waw.pl/NKJPNGrams|N-grams extracted from balanced National Corpus of Polish]],
   * [[http://zil.ipipan.waw.pl/NKJP1mEcono|Economy-related subcorpus of the National Corpus of Polish, with manually created sense annotation layer]],
   * [[http://zil.ipipan.waw.pl/TeiAPI|A java library for parsing NKJP-compatible TEI P5 files]].
 * [[NationalCorpusOfPolish|National Corpus of Polish]] (NKJP),
Line 36: Line 29:
 * [[http://argumentacja.pdg.pl/argdbpl/|ArgDB-pl]], a Polish corpus of arguments in natural contexts.  * [[http://argumentacja.pdg.pl/argdbpl/|ArgDB-pl]], a Polish corpus of arguments in natural contexts,
 * [[http://www.staff.amu.edu.pl/~romang/wiki_errors_pl.php|PIEWiC]], a Polish corpus of errors automatically extracted from Wikipedia revisions.
Line 64: Line 58:
== Morphological tools and resources ==
 * [[http://zil.ipipan.waw.pl/PoliMorf|PoliMorf]], the ultimate inflectional dictionary of Polish (under development),
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni),
 * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran),
 * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]),
 * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
 * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus),
 * [[http://gram.neurosoft.pl/|Neurosoft Gram]] (demo of a morphological analyser),
 * [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
 * [[http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa_polski.html|Finite state utilities]] (J. Daciuk),
 * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki/|WCCL]], toolkit for morphosyntactic feature generation (A. Radziszewski, A. Wardyński, T. Śniatowski, P. Kędzia),
 * [[http://lemmatise.ijs.si/Services|LemmaGen]], Multilingual Open Source Lemmatisation for 11 EU languages, including Polish (M. Jursic, T. Erjavec et al.)

== Taggers ==
 * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish (Decision Trees),
 * [[http://zil.ipipan.waw.pl/PANTERA|PANTERA]], a morphosyntactic tagger for Polish (Transformation-Based Learning),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wmbt/wiki|WMBT]], a morphosyntactic tagger for Polish (Memory-Based Learning),
 * [[http://zil.ipipan.waw.pl/TaCo|TaCo]], a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT]], a morphosyntactic tagger for Polish (Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/Concraft|Concraft]], a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]].

== Parsers, grammars, treebanks ==
 * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|Składnica]], a hybrid constituency/dependency treebank of Polish (under development),
 * [[http://zil.ipipan.waw.pl/plTAG|TAG grammar of Polish]],
 * Świgra, a DCG parser,
   * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|version 1.0]] (2005),
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|version 1.5]] as used in Składnica (2011),
 * Spejd, a shallow parsing and disambiguation system,
   * the [[http://zil.ipipan.waw.pl/Spejd|current version]] of the system,
   * Spejd [[attachment:gramatyka_Spejd_NKJP_1.0.zip|grammar of Polish]] (version 1.0), developed by K. Głowińska within [[http://nkjp.pl/|NKJP]], available on GNU GPL v.3,
   * [[http://zil.ipipan.waw.pl/SEJFEK4Spejd|SEJFEK4Spejd]] - a Spejd grammar version of [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]] and a converter from dictionary to grammar,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system,
 * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]],
 * [[http://amos.klf.uw.edu.pl/| Visualisation of parsing tree forests]] (Świdziński's grammar, Świgra, Morfeusz, Bień's syntactic spreadsheets) by Andrzej Zaborowski,
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński),
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/Szpakowicz/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz),
 * [[http://thetos.aei.polsl.pl/las/|Serwer LAS / Linguistic Analysis Server]],
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/disaster|Disaster]] (DISAmbiguator and STatistical chunkER) – a Python module for chunking and morphosyntactic disambiguation,
 * [[http://nlp.pwr.wroc.pl/redmine/projects/iobber/wiki|Iobber]], a CRF chunker for Polish.
Line 114: Line 62:
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons,
 * [[http://www.ispan.waw.pl/zakjez/pracjcz/slowniki/slowniki.html|Słowniki składniowe języka czeskiego i polskiego]] (Z. Greń),
Line 124: Line 70:
 * [[http://clip.ipipan.waw.pl/Walenty|Walenty]], the Polish Valence Dictionary (F. Skwarski, M. Świdziński, W. Kieraś, E. Hajnicz, A. Patejuk, A. Przepiórkowski, M. Woliński),  * [[http://zil.ipipan.waw.pl/Walenty|Walenty]], the Polish Valence Dictionary (E. Hajnicz, W. Kieraś, A. Patejuk, A. Przepiórkowski, F. Skwarski, M. Świdziński, M. Woliński),
Line 144: Line 90:

== Morphological tools and resources ==
 * [[http://zil.ipipan.waw.pl/PoliMorf|PoliMorf]], the ultimate inflectional dictionary of Polish (under development),
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),
 * [[http://sgjp.pl/siat/|Index a tergo of Polish word forms]] (J. Tokarski, Z. Saloni),
 * [[http://morfologik.blogspot.com/|Morfologik]], morphological analyser (M. Miłkowski, D. Weiss),
 * [[ftp://ftp.mimuw.edu.pl/pub/users/polszczyzna/SAM-95/|SAM]], morphological analyser (K. Szafran),
 * [[http://utt.amu.edu.pl/|UAM Text Tools]] (P. Obrębski, Z. Vetulani; see also [[http://utt.wmi.amu.edu.pl/trac/wiki/]]),
 * [[http://nl.ijs.si/ME/V4/msd/html|MULTEXT-East, v.4 ]], morphosyntactic specifications and documentation for 16 languages,
 * [[http://nl.ijs.si/ME/V4/doc/index.html#sec-lex|Sample morphosyntactic Polish lexicon]], the MULTEXT-East morphosyntactic lexicons,
 * [[http://www.domeczek.pl/~polukr/mte-conv|KIPI->MTE]], a converter from TaKIPI to MULTEXT-East morphosyntactic format (A. Radziszewski, N. Kotsyba),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|MACA]], Morphological Analysis Converter and Aggregator (A. Radziszewski, T. Śniatowski),
 * [[http://sgalus.republika.pl/indexe.html|Lexical analyser and a Polish proof-reader]] (S. Galus),
 * [[http://gram.neurosoft.pl/|Neurosoft Gram]] (demo of a morphological analyser),
 * [[http://winnie.ics.agh.edu.pl/proj_uk/fleksbaz/|Baza fleksyjna języka polskiego]], inflection database of Polish words (W. Lubaszewski, B. Moskal, P. Pietras, P. Pisarek, T. Rokicka),
 * [[http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa_polski.html|Finite state utilities]] (J. Daciuk),
 * [[http://getopt.org/stempel/|Stempel]], another stemmer (A. Białecki),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/joskipi/wiki/|WCCL]], toolkit for morphosyntactic feature generation (A. Radziszewski, A. Wardyński, T. Śniatowski, P. Kędzia),
 * [[http://lemmatise.ijs.si/Services|LemmaGen]], Multilingual Open Source Lemmatisation for 11 EU languages, including Polish (M. Jursic, T. Erjavec et al.)

== Taggers ==
 * [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]], a morphosyntactic tagger for Polish (Decision Trees),
 * [[http://zil.ipipan.waw.pl/PANTERA|PANTERA]], a morphosyntactic tagger for Polish (Transformation-Based Learning),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wmbt/wiki|WMBT]], a morphosyntactic tagger for Polish (Memory-Based Learning),
 * [[http://zil.ipipan.waw.pl/TaCo|TaCo]], a statistical morphosyntactic tagset converter for positional tagsets (e.g. Polish),
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT]], a morphosyntactic tagger for Polish (Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/Concraft|Concraft]], a morphosyntactic disambiguation tool for Polish (Constrained Conditional Random Fields),
 * [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]].
 * [[http://clarin.pelcra.pl/tools/tagger|OpenNLP-based tagger trained on the 1M NKJP corpus]] with a [[http://clarin.pelcra.pl/tools/api/hask/application.wadl|REST API]]

== Parsers, grammars, treebanks ==
 * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|Składnica]], a hybrid constituency/dependency treebank of Polish (under development),
 * [[http://zil.ipipan.waw.pl/plTAG|TAG grammar of Polish]],
 * Świgra, a DCG parser,
   * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|version 1.0]] (2005),
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|version 1.5]] as used in Składnica (2011),
 * Spejd, a shallow parsing and disambiguation system,
   * the [[http://zil.ipipan.waw.pl/Spejd|current version]] of the system,
   * Spejd [[attachment:gramatyka_Spejd_NKJP_1.0.zip|grammar of Polish]] (version 1.0), developed by K. Głowińska within [[http://nkjp.pl/|NKJP]], available on GNU GPL v.3,
   * [[http://zil.ipipan.waw.pl/SEJFEK4Spejd|SEJFEK4Spejd]] - a Spejd grammar version of [[http://zil.ipipan.waw.pl/SEJFEK|SEJFEK]] and a converter from dictionary to grammar,
 * [[http://sourceforge.net/projects/dendrarium/|Dendrarium]], a treebank development system,
 * [[http://nlp.ipipan.waw.pl/CRIT2/|A Treebank / Test Suite for Polish]],
 * [[http://amos.klf.uw.edu.pl/| Visualisation of parsing tree forests]] (Świdziński's grammar, Świgra, Morfeusz, Bień's syntactic spreadsheets) by Andrzej Zaborowski,
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/AS/index.html|Analizator syntaktyczny AS]] (M. Woliński),
 * [[ftp://ftp.mimuw.edu.pl/pub/People/polszczyzna/Szpakowicz/|Formalny opis składniowy zdań polskich]] (S. Szpakowicz),
 * [[http://las.aei.polsl.pl/las2/|Serwer LAS / Linguistic Analysis Server]],
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/disaster|Disaster]] (DISAmbiguator and STatistical chunkER) – a Python module for chunking and morphosyntactic disambiguation,
 * [[http://nlp.pwr.wroc.pl/redmine/projects/iobber/wiki|Iobber]], a CRF chunker for Polish.
Line 167: Line 161:
 * [[http://thetos.aei.polsl.pl/|Thetos]] (PL-Sign language).  * [[http://thetos.polsl.pl/|Thetos]] (PL-Sign language).
Line 179: Line 173:
 * [[http://chopin.ipipan.waw.pl/multiservice/|Multiservice]], a sample interface for running NLP Web services for Polish,  * [[http://glass.ipipan.waw.pl/multiservice/|Multiservice]], a sample interface for running NLP Web services for Polish,

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Spoken corpora

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Speech analysis and synthesis tools

Machine translation demonstrations

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski)

  • Translatica SRX sentence segmentation rules for Polish (LGPL)

  • Lakon, a system for news summarization (master's thesis by A. Dudczak),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3.

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki).

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish.

  • Fextor, a feature extraction framework.

  • LexCSD, a system for semi-automatic sense disambiguation.

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition

  • WordnetLoom, an wordnet editor application.

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units.

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NCP).

  • Polish Coreference Tools, a suite of Polish coreference resolution tools, created as part of the CORE project.