Locked History Actions

Diff for "LRT"

Differences between revisions 251 and 269 (spanning 18 versions)
Revision 251 as of 2014-05-16 10:29:25
Size: 20697
Comment:
Revision 269 as of 2014-10-21 08:52:54
Size: 22425
Comment:
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
 * [[PSC|Polish Sejm Corpus]],
 * [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]],
Line 27: Line 29:
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a corpus of Polish coreference relations, created as part of the [[http://zil.ipipan.waw.pl/CORE|CORE project]],  * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a corpus of Polish coreference relations, created as part of the [[http://glass.ipipan.waw.pl/wiki/core/About|CORE project]],
Line 29: Line 31:
 * [[http://www.staff.amu.edu.pl/~romang/wiki_errors_pl.php|PIEWiC]], a Polish corpus of errors automatically extracted from Wikipedia revisions.  * [[http://www.staff.amu.edu.pl/~romang/wiki_errors_pl.php|PIEWiC]], a Polish corpus of errors automatically extracted from Wikipedia revisions,
 * [[http://korpusy.net/|korpusy.net]], a corpus research-related website (B. Gałkowski).
Line 36: Line 39:
 * [[http://pelcra.pl/new/convers_pl|The PELCRA conversational corpus of Polish]]. TEI P5-encoded transcriptions of 1.4 million words of conversational spoken Polish collected in the years 2001-2011 (within the PELCRA and NKJP projects) available under CC-BY-NC with a [[http://clarin.pelcra.pl/Spokes/|dedicated interface]] currently developed within the CLARIN project    * [[http://pelcra.pl/new/convers_pl|The PELCRA conversational corpus of Polish]]. TEI P5-encoded transcriptions of 1.4 million words of conversational spoken Polish collected in the years 2001-2011 (within the PELCRA and NKJP projects) available under CC-BY-NC with a [[http://clarin.pelcra.pl/Spokes/|dedicated interface]] currently developed within the CLARIN project,
Line 55: Line 58:
 * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association.  * [[http://www.tausdata.org/|TAUS Data]], a multilingual TM from the members of TAUS Data Association,
 * [[http://glosbe
.com/|Glosbe]], an open source TM.
Line 116: Line 120:
 * [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]].
 * [[http://clarin.pelcra.pl/tools/tagger|OpenNLP-based PoS tagger trained on the 1M NKJP corpus]] with a [[http://clarin.pelcra.pl/tools/api/hask/application.wadl|REST API]]
 * [[http://zil.ipipan.waw.pl/PoliTa|PoliTa]], a morphosyntactic meta-tagger,
* [[http://zil.ipipan.waw.pl/NKJP%20model%20for%20TnT%20Tagger|NKJP model for TnT Tagger]], a trained model usable on Morfeusz-segmented text with [[http://www.coli.uni-saarland.de/~thorsten/tnt/|TnT Tagger]],
 * [[http://clarin.pelcra.pl/tools/tagger|OpenNLP-based PoS tagger trained on the 1M NKJP corpus]] with a [[http://clarin.pelcra.pl/tools/api/hask/application.wadl|REST API]].
Line 139: Line 144:
== Sentiment analysis ==
 * [[http://zil.ipipan.waw.pl/SlownikWydzwieku|Polish sentiment dictionary]], with sentiment scores computed using supervised methods (A. Wawer),
 * [[http://zil.ipipan.waw.pl/HateSpeech|HateSpeech corpus]], 2000 manually annotated documents representing various types and degrees of offensive language expressed toward minorities,
 * [[http://zil.ipipan.waw.pl/Korpus%20Szczerosci|Sincerity Corpus (Korpus Szczerości)]], a collection of fake and real reviews.

== Coreference ==
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]], a 500 M corpus of general nominal coreference in Polish (M. Ogrodniczuk),
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceTools|Polish Coreference Tools]], a suite of Polish coreference resolution tools, created as part of the [[http://zil.ipipan.waw.pl/CORE|CORE project]].
Line 164: Line 178:
 * [[http://las.aei.polsl.pl/PolSum/#/Home|PolSum]] by S. Kulików,
 * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (master's thesis by A. Dudczak).
 * [[http://las.aei.polsl.pl/PolSum/#/Home|PolSum]] (S. Kulików),
 * [[http://www.cs.put.poznan.pl/dweiss/research/lakon/|Lakon]], a system for news summarization (A. Dudczak),
 * [[http://clip.ipipan.waw.pl/Summarizer|Summarizer]] (J. Świetlicka),
 * you can also test Lakon, Open Text Summarizer and Summarizer in [[http://glass.ipipan.waw.pl/multiservice/|Multiservice]]
 * and take a look at the [[http://zil.ipipan.waw.pl/PolishSummariesCorpus|Polish Summaries Corpus]].

== Diacritization ==
 * [[http://www.gzegzolka.com/poliszynel/|Poliszynel]] and [[http://www.spolszcz.pl/|spolszcz.pl]]
 * [[http://www.polszczyzna.info/polonizator|Polonizator]]
 * http://slowniki.zoni.pl/?s=ogonki|Polonizer]]
Line 172: Line 194:
 * [[http://nlp.pwr.wroc.pl/redmine/projects/toki/wiki|Toki]], a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski)
 * [[http://poleng.pl/translatica-pl.srx|Translatica SRX sentence segmentation rules for Polish (LGPL)]] 
 * [[http://nlp.pwr.wroc.pl/redmine/projects/toki/wiki|Toki]], a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),
 * [[http://poleng.pl/translatica-pl.srx|Translatica SRX sentence segmentation rules for Polish (LGPL)]],
Line 175: Line 197:
 * [[http://glass.ipipan.waw.pl/multiservice/|Multiservice]], a sample interface for running NLP Web services for Polish,  * [[http://glass.ipipan.waw.pl/multiservice/|Multiservice]], a sample interface for running NLP Web services for Polish (see also [[http://glass.ipipan.waw.pl/redmine/projects/multiserwis/wiki/Usage|usage]] and [[http://glass.ipipan.waw.pl/redmine/projects/multiserwis/wiki/InOut|format]]),
Line 178: Line 200:
 * [[http://zil.ipipan.waw.pl/Nerf|Nerf]], a tool for named entity recognition, available on GNU GPL v.3.
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/liner2|Liner2]], named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki).
 * [[http://psi-toolkit.wmi.amu.edu.pl/index.html|PSI-Toolkit]], a chain of publicly available tools for automatic processing of Polish.
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/fextor|Fextor]], a feature extraction framework.
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/lexcsd|LexCSD]], a system for semi-automatic sense disambiguation.
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/supermatrix|SuperMatrix]], a general tool for lexical semantic knowledge acquisition
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/wordnetloom|WordnetLoom]], an wordnet editor application.
 * [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]], tool for the creation of electronic inflectional dictionaries of multi-word units.
 * [[http://zil.ipipan.waw.pl/CorpCor|CorpCor]], a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NCP).
 * [[http://zil.ipipan.waw.pl/PolishCoreferenceTools|Polish Coreference Tools]], a suite of Polish coreference resolution tools, created as part of the [[http://zil.ipipan.waw.pl/CORE|CORE project]].
 * [[http://www.spolszcz.pl/|Online service supporting adding diacritical marks to Polish
texts]].
 * [[http://zil.ipipan.waw.pl/Nerf|Nerf]], a tool for named entity recognition, available on GNU GPL v.3,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/liner2|Liner2]], named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),
 * [[http://psi-toolkit.wmi.amu.edu.pl/index.html|PSI-Toolkit]], a chain of publicly available tools for automatic processing of Polish,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/fextor|Fextor]], a feature extraction framework,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/lexcsd|LexCSD]], a system for semi-automatic sense disambiguation,
 * [[http://www.nlp.pwr.wroc.pl/en/tools-and-resources/supermatrix|SuperMatrix]], a general tool for lexical semantic knowledge acquisition,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/wordnetloom|WordnetLoom]], an wordnet editor application,
 * [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]], tool for the creation of electronic inflectional dictionaries of multi-word units,
 * [[http://zil.ipipan.waw.pl/CorpCor|CorpCor]], a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP),
 * [[http://www.spolszcz.pl/|Spolszcz.pl]], an online service supporting adding diacritical marks to Polish texts.

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Spoken corpora

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Sentiment analysis

Coreference

Speech analysis and synthesis tools

Machine translation demonstrations

Summarizers

Diacritization

Other

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

  • Translatica SRX sentence segmentation rules for Polish (LGPL),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Multiservice, a sample interface for running NLP Web services for Polish (see also usage and format),

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3,

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish,

  • Fextor, a feature extraction framework,

  • LexCSD, a system for semi-automatic sense disambiguation,

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

  • WordnetLoom, an wordnet editor application,

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP),

  • Spolszcz.pl, an online service supporting adding diacritical marks to Polish texts.