Locked History Actions

Diff for "LRT"

Differences between revisions 395 and 413 (spanning 18 versions)
Revision 395 as of 2019-03-07 18:18:01
Size: 29642
Editor: AgataSavary
Comment:
Revision 413 as of 2020-01-22 10:47:30
Size: 30764
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
 * [[PSC|Polish Sejm Corpus]],  * [[PPC|Polish Parliamentary Corpus]],
Line 28: Line 28:
 * [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]], annotated manually for verbal multiword expressions in 18 languages including Polish, used in the [[http://multiword.sourceforge.net/sharedtask2017|PARSEME shared task 1.0]]; the Polish subcorpus is aligned with automatic dependency annotations in the [[http://universaldependencies.org/guidelines.html|UD]] format (A. Savary),  * [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]], annotated manually for verbal multiword expressions in 20 languages including Polish, used in the [[http://multiword.sourceforge.net/sharedtask2018/|PARSEME shared task 1.1]]; the Polish subcorpus is aligned with automatic dependency annotations in the [[http://universaldependencies.org/guidelines.html|UD]] format (A. Savary),
Line 31: Line 31:
 * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine.  * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine,
 * [[http://synamet.polon.uw.edu.pl/|Microcorpus of Synesthetic Metaphors]].
Line 40: Line 41:
 * [[http://rhssl1.uni-regensburg.de/SlavKo/korpus/poldi|PolDi]], a Polish Diachronic Online Corpus (R. Meyer)
Line 42: Line 44:
 * [[http://korpus19.nlp.ipipan.waw.pl/|Manually annotated and transcribed corpus of the 19th century Polish]], (1830–1918, IPI PAN)
Line 43: Line 46:
 * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński)  * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński),
 * [[http://historyczne.nlp.ipipan.waw.pl/|Diachronic search engine (prototype)]].
Line 111: Line 115:
 * [[http://sgjp.pl|Słownik gramatyczny języka polskiego]],
Line 128: Line 133:
 * [[http://sgjp.pl|SGJP]], Grammatical Dictionary of Polish (the list of inflected forms is available with [[http://download.sgjp.pl/morfeusz/|Morfeusz]]),
Line 129: Line 135:
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),  * [[http://morfeusz.sgjp.pl/|Morfeusz SGJP]], morphological analyser,
Line 155: Line 161:
 * [[https://github.com/kwrobel-nlp/krnnt|KRNNT]], a morphological tagger for Polish based on recurrent neural networks,
Line 159: Line 166:
 * [[http://zil.ipipan.waw.pl/PDB|PDB 2.0]], a dependency treebank of Polish (A. Wróblewska),
 * [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|PDB-UD]], a version of PDB 2.0 in Universal Dependencies format (A. Wróblewska),
Line 167: Line 176:
 * Świgra, a DCG parser,
   * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|version 1.0]] (2005),
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|version 1.5]] as used in Składnica (2011),
   * [[http://swigra.nlp.ipipan.waw.pl/|Świgra 2.0 demo]] (2015, please use Firefox).
 * [[http://zil.ipipan.waw.pl/%C5%9Awigra|Świgra]], a DCG parser,
   * [[http://swigra.nlp.ipipan.waw.pl/|On-line demo]],
Line 246: Line 253:
 * [[http://multiword.sourceforge.net/sharedtaskresults2018/|VMWE identifiers]], 13 systems having participated in the PARSEME shared task for automatic identification of verbal MWEs,
Line 248: Line 254:
 * [[http://multiword.sourceforge.net/sharedtaskresults2018/|VMWE identifiers]], systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish,
Line 274: Line 281:
 * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.)  * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),
 * [[http://zil.ipipan.waw.pl/SpacyPL|SpacyPL]], Polish language models and resources for [[https://spacy.io|Spacy]]
 * [[https://jasnopis.pl/|Jasnopis]], analyzer of text obscurity level
 * [[http://zil.ipipan.waw.pl/Scwad/AIDe|AIDe]], corpus of image description in Polish (A. Wróblewska)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora of contemporary Polish

Written corpora of historical Polish

Spoken corpora

Language models

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Sentiment analysis

Coreference

Speech analysis and synthesis tools

Machine translation demonstrations

Summarizers

Diacritization

Named entity recognition

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

  • TIMEX, a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

Multiword expression software

  • TermoPL, multiword expression extraction tool,

  • VMWE identifiers, systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish,

  • PARSEME-FR demonstrator, including the ATILF-LLF multiword expression identifier for Polish,

Aggregating services

Other

  • Mobile plWordNet, free mobile application for plWordNet browsing (J. Kocoń),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

  • Translatica SRX sentence segmentation rules for Polish (LGPL),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Fextor, a feature extraction framework,

  • LexCSD, a system for semi-automatic sense disambiguation,

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

  • WordnetLoom, an wordnet editor application,

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).

  • Stylo 2, stylometry demo,

  • DeepEvents, event extraction in Polish, based on deep neural networks.

  • Word similarity, calculation of the similarity of words based on word embeddings, on-line service,

  • Baltoslav, with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),

  • SpacyPL, Polish language models and resources for Spacy

  • Jasnopis, analyzer of text obscurity level

  • AIDe, corpus of image description in Polish (A. Wróblewska)