This is the README file of the Polish XML subcorpus of the PARSEME corpus Agata Savary, 17 November 2017 The official release in the [[http://typo.uni-konstanz.de/parseme/index.php/2-general/184-parseme-shared-task-format-of-the-final-annotation|parseme-tsv]] format, aligned with morphological and syntactic annotations in [[http://universaldependencies.org/format.html|CoNLL-U]] format is available via [[http://hdl.handle.net/11372/LRT-2282|LINDAT/CLARIN]] (see README.md of the Polish subcorpus). This README describes the '''XML version''' of the Polish corpus. The Polish data stem from * the [[http://clip.ipipan.waw.pl/NationalCorpusOfPolish|National Corpus of Polish]] - all texts from daily newspapers are included, i.e. those whose identifiers start with 130-2, 130-3 or 130-5; the texts with identifiers starting with 130-5 were merged into bigger files for an easier file management: * from 130-5-000000001 to 130-5-000000099 - merged into 130-5-0000000 ° from 130-5-000000100 to 130-5-000000199 - merged into 130-5-0000001 ° from 130-5-000000200 to 130-5-000000299 - merged into 130-5-0000002 * etc. ° from 130-5-000001900 to 130-5-000001999 - merged into PL-NKJP-130-5-0000019 ° from 130-5-000001999 to 130-5-000002000 - merged into PL-NKJP-130-5-0000020 * the [[http://zil.ipipan.waw.pl/PolishCoreferenceCorpus|Polish Coreference Corpus]] - the 21 "long" texts from this corpus are included, 36,000 tokens, Rzeczpospolita newspaper VMWEs have been annotated by a single annotator per file. The following [[http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.0/?page=030_Categories_of_VMWEs|categories]] are used: ID, IReflV, LVC, OTH. All VMWEs annotations were performed by Agata Savary. The VMWEs annotations are distributed under the terms of the [CC-BY v4](https://creativecommons.org/licenses/by/4.0/) license. Contact: agata.savary@univ-tours.fr