Locked History Actions

Diff for "LRT"

Differences between revisions 322 and 324 (spanning 2 versions)
Revision 322 as of 2017-01-20 14:41:05
Size: 25038
Comment:
Revision 324 as of 2017-01-26 10:54:26
Size: 25738
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
== Written corpora and corpus-related tools == == Written corpora of contemporary Polish ==
Line 14: Line 14:
 * [[PL196x|Polish language of the 1960s]],
Line 19: Line 18:
  * Now available also as corpora in the Poliqarp for !DjVu [[http://poliqarp.wbl.klf.uw.edu.pl|search engine]],
 * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine (please see also [[http://nlp.ipipan.waw.pl/Poliqarp/|the beta version of Poliqarp 1.1 with statistical extensions]]),
 * [[http://zil.ipipan.waw.pl/Anotatornia|Anotatornia]], a system for multi-level manual annotation of corpora,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/inforex|Inforex]], a web-based system designed for managing and annotating text corpora on the semantic level,
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer,
  * Now available also as corpora in the Poliqarp for !DjVu [[http://poliqarp.wbl.klf.uw.edu.pl|search engine]],
Line 32: Line 27:

== Written corpora of historical Polish ==
 * [[PL196x|Polish language of the 1960s]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński),
 * [[http://chronopress.clarin-pl.eu/|ChronoPress]], corpus of press texts from 1945–1954 (A. Pawłowski),
 * [[http://www.f19.uw.edu.pl/|Microcorpus of Polish: 1830-1918]], (M. Derwojedowa),
 * [[http://korba.edu.pl|KORBA]], electronic corpus of 17th and 18th century Polish texts (W. Gruszczyński),
 * [[http://www.spxvi.edu.pl/korpus/|Corpus of 16. century Polish]] (IBL PAN),
 * [[https://www.ijp-pan.krakow.pl/publikacje-elektroniczne/korpus-tekstow-staropolskich|Corpus of old Polish (up to 1500)]] (IJP PAN).


== Corpus-related tools and resources ==
 * [[http://poliqarp.sourceforge.net/|Poliqarp]], a corpus indexing and search engine (please see also [[http://nlp.ipipan.waw.pl/Poliqarp/|the beta version of Poliqarp 1.1 with statistical extensions]]),
 * [[http://zil.ipipan.waw.pl/Anotatornia|Anotatornia]], a system for multi-level manual annotation of corpora,
 * [[http://nlp.pwr.wroc.pl/en/tools-and-resources/inforex|Inforex]], a web-based system designed for managing and annotating text corpora on the semantic level,
 * [[http://smyrna.danieljanus.pl/|Smyrna]], a simple, light-weight Polish concordancer,

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora of contemporary Polish

Written corpora of historical Polish

Spoken corpora

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Sentiment analysis

Coreference

Speech analysis and synthesis tools

Machine translation demonstrations

Summarizers

Diacritization

Named Entity Recognition

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

  • TIMEX, a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

Aggregating services

Other

  • Mobile plWordNet, free mobile application for plWordNet browsing (J. Kocoń),

  • Kolokacje, a Web crawler and collocation finder (A. Buczyński),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

  • Translatica SRX sentence segmentation rules for Polish (LGPL),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • PSI-Toolkit, a chain of publicly available tools for automatic processing of Polish,

  • Fextor, a feature extraction framework,

  • LexCSD, a system for semi-automatic sense disambiguation,

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

  • WordnetLoom, an wordnet editor application,

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).

  • Stylo 2, stylometry demo.