Locked History Actions

Diff for "LRT"

Differences between revisions 398 and 414 (spanning 16 versions)
Revision 398 as of 2019-03-07 18:47:42
Size: 29691
Editor: AgataSavary
Comment:
Revision 414 as of 2020-01-22 10:50:45
Size: 31156
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
 * [[PSC|Polish Sejm Corpus]],  * [[PPC|Polish Parliamentary Corpus]],
Line 31: Line 31:
 * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine.  * [[https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/polish-text-corpora/|Polish text corpora]] included in Sketch Engine,
 * [[http://synamet.polon.uw.edu.pl/|Microcorpus of Synesthetic Metaphors]].
Line 40: Line 41:
 * [[http://rhssl1.uni-regensburg.de/SlavKo/korpus/poldi|PolDi]], a Polish Diachronic Online Corpus (R. Meyer)
Line 42: Line 44:
 * [[http://korpus19.nlp.ipipan.waw.pl/|Manually annotated and transcribed corpus of the 19th century Polish]], (1830–1918, IPI PAN)
Line 43: Line 46:
 * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński)  * [[PL196x|Polish language of the 1960s / Frequency corpus]] (I. Kurcz, A. Lewicki, J. Sambor, J. Woronczak, K. Szafran, J. S. Bień, M. Woliński),
 * [[http://historyczne.nlp.ipipan.waw.pl/|Diachronic search engine (prototype)]].
Line 111: Line 115:
 * [[http://sgjp.pl|Słownik gramatyczny języka polskiego]],
Line 128: Line 133:
 * [[http://sgjp.pl|SGJP]], Grammatical Dictionary of Polish (the list of inflected forms is available with [[http://download.sgjp.pl/morfeusz/|Morfeusz]]),
Line 129: Line 135:
 * [[http://sgjp.pl/morfeusz/|Morfeusz SGJP]], morphological analyser (Z. Saloni, W. Gruszczyński, M. Woliński, R. Wołosz),  * [[http://morfeusz.sgjp.pl/|Morfeusz SGJP]], morphological analyser,
Line 155: Line 161:
 * [[https://github.com/kwrobel-nlp/krnnt|KRNNT]], a morphological tagger for Polish based on recurrent neural networks,
Line 159: Line 166:
 * [[http://zil.ipipan.waw.pl/PDB|PDB 2.0]], a dependency treebank of Polish (A. Wróblewska),
 * [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|PDB-UD]], a version of PDB 2.0 in Universal Dependencies format (A. Wróblewska),
Line 167: Line 176:
 * Świgra, a DCG parser,
   * [[http://nlp.ipipan.waw.pl/~wolinski/swigra/|version 1.0]] (2005),
   * [[http://zil.ipipan.waw.pl/Sk%C5%82adnica|version 1.5]] as used in Składnica (2011),
   * [[http://swigra.nlp.ipipan.waw.pl/|Świgra 2.0 demo]] (2015, please use Firefox).
 * [[http://zil.ipipan.waw.pl/%C5%9Awigra|Świgra]], a DCG parser,
   * [[http://swigra.nlp.ipipan.waw.pl/|On-line demo]],
Line 185: Line 192:

== Semantic resources ==
 * [[http://zil.ipipan.waw.pl/Scwad/CDSCorpus|CDSCorpus]], a dataset of 10k pairs of Polish sentences manually annotated for semantic relatedness and entailment (A. Wróblewska)
 * [[http://git.nlp.ipipan.waw.pl/Scwad/SCWAD-probing-data|Probing datasets]], Polish and English probing datasets for linguistic verification of sentence embeddings (A. Wróblewska)
Line 274: Line 285:
 * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.)  * [[http://baltoslav.eu/?mova=pl|Baltoslav]], with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),
 * [[http://zil.ipipan.waw.pl/SpacyPL|SpacyPL]], Polish language models and resources for [[https://spacy.io|Spacy]]
 * [[https://jasnopis.pl/|Jasnopis]], analyzer of text obscurity level
 * [[http://zil.ipipan.waw.pl/Scwad/AIDe|AIDe]], corpus of image descriptions in Polish (A. Wróblewska)

Language Tools and Resources for Polish

This page contains a list of publicly available language tools and resources.

Written corpora of contemporary Polish

Written corpora of historical Polish

Spoken corpora

Language models

Parallel corpora and translation memories

Machine-readable dictionaries

Human-readable dictionaries

Morphological tools and resources

Taggers

Parsers, grammars, treebanks

Semantic resources

  • CDSCorpus, a dataset of 10k pairs of Polish sentences manually annotated for semantic relatedness and entailment (A. Wróblewska)

  • Probing datasets, Polish and English probing datasets for linguistic verification of sentence embeddings (A. Wróblewska)

Sentiment analysis

Coreference

Speech analysis and synthesis tools

Machine translation demonstrations

Summarizers

Diacritization

Named entity recognition

  • Nerf, a tool for named entity recognition, available on GNU GPL v.3 (J. Waszczuk),

  • Liner2, named entity recognizer released on GNU GPL with models to recognize 5 and 56 categories of proper names (M. Marcińczuk and M. Janicki),

  • TIMEX, a model for Liner2 to recognize and normalize temporal expressions (J. Kocoń and M. Marcińczuk).

Multiword expression software

  • TermoPL, multiword expression extraction tool,

  • VMWE identifiers, systems having participated in the PARSEME shared task for automatic identification of verbal MWEs, 13 out of 17 systems submitted results for Polish,

  • PARSEME-FR demonstrator, including the ATILF-LLF multiword expression identifier for Polish,

Aggregating services

Other

  • Mobile plWordNet, free mobile application for plWordNet browsing (J. Kocoń),

  • WSDDE, a system for designing and performing Word Sense Disambiguation experiments (R. Młodzki et al.),

  • Frazeo, a search engine and clusterer of news in Polish (P. Pęzik),

  • Segment, a rule-based sentence tokenizer supporting SRX standard (J. Lipski; the Polish rules are available in LanguageTool project, see here for short instructions on how to use the tool),

  • Toki, a tokenizer supporting SRX standard, C++ library and toolkit (T. Śniatowski and A. Radziszewski),

  • Translatica SRX sentence segmentation rules for Polish (LGPL),

  • SyMGIZA++, an extension of Giza++ that computes symmetric word alignment models,

  • Hipisek, an experimental question answering system (M. Walas),

  • Narzędzia dygitalizacji tekstów, Poliqarp for DjVu i inne programy,

  • Fextor, a feature extraction framework,

  • LexCSD, a system for semi-automatic sense disambiguation,

  • SuperMatrix, a general tool for lexical semantic knowledge acquisition,

  • WordnetLoom, an wordnet editor application,

  • Toposław, tool for the creation of electronic inflectional dictionaries of multi-word units,

  • CorpCor, a web-based tool for correcting morphosyntactic annotation in TEI XML encoded corpora (e.g. NKJP).

  • Stylo 2, stylometry demo,

  • DeepEvents, event extraction in Polish, based on deep neural networks.

  • Word similarity, calculation of the similarity of words based on word embeddings, on-line service,

  • Baltoslav, with several script converters (Romanizer, Cyrillizer, IPA Converter etc.),

  • SpacyPL, Polish language models and resources for Spacy

  • Jasnopis, analyzer of text obscurity level

  • AIDe, corpus of image descriptions in Polish (A. Wróblewska)