<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>PARSEME-PL</title><revhistory><revision><revnumber>31</revnumber><date>2019-03-07 18:42:55</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>30</revnumber><date>2019-03-07 18:40:18</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>29</revnumber><date>2019-03-07 18:29:24</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>28</revnumber><date>2017-12-06 22:13:07</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>27</revnumber><date>2017-11-17 15:07:42</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>26</revnumber><date>2017-11-17 15:06:10</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>25</revnumber><date>2017-11-17 15:05:59</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>24</revnumber><date>2017-11-17 15:05:30</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>23</revnumber><date>2017-11-17 14:51:13</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>22</revnumber><date>2017-11-17 14:45:54</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>21</revnumber><date>2017-11-17 14:40:03</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>20</revnumber><date>2017-11-17 14:38:15</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>19</revnumber><date>2017-11-17 14:26:43</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>18</revnumber><date>2017-11-17 14:25:59</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>17</revnumber><date>2017-11-17 14:25:38</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>16</revnumber><date>2017-11-17 14:25:18</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>15</revnumber><date>2017-11-17 14:24:48</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>14</revnumber><date>2017-11-17 14:18:02</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>13</revnumber><date>2017-11-17 14:14:45</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>12</revnumber><date>2017-11-17 14:14:12</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>11</revnumber><date>2017-11-17 14:11:53</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>10</revnumber><date>2017-11-17 14:11:23</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>9</revnumber><date>2017-11-17 14:08:40</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>8</revnumber><date>2017-11-17 14:08:23</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>7</revnumber><date>2017-11-17 14:08:13</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>6</revnumber><date>2017-11-17 14:07:48</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>5</revnumber><date>2017-11-17 13:55:56</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>4</revnumber><date>2017-11-17 13:55:09</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>3</revnumber><date>2017-11-17 13:54:05</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>2</revnumber><date>2017-11-17 13:53:23</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>1</revnumber><date>2017-11-17 13:52:05</date><authorinitials>AgataSavary</authorinitials></revision></revhistory></articleinfo><section><title>Polish PARSEME corpus</title><para>The PARSEME corpus is a multilingual corpus annotated manually for verbal multiword expressions (VMWEs) in <emphasis role="strong">20 languages</emphasis> including Polish. If was used in the <ulink url="http://multiword.sourceforge.net/sharedtask2018/">PARSEME shared task</ulink> on automatic identification of verbal multiword expressions. It was created due to a collective effort of the IC1207 COST action <ulink url="http://www.parseme.eu">PARSEME</ulink>. It contains in total 280,838 sentences, 6,072,331 tokens and 79,326 annotated VMWEs in 20 languages. It is openly available via <ulink url="http://hdl.handle.net/11372/LRT-2842">LINDAT/CLARIN</ulink> under various flavours of the Creative Common licence. </para><para>Reference publications:  </para><itemizedlist><listitem><para>Carlos Ramisch, Silvio Ricardo Cordeiro, Agata Savary, Veronika Vincze, Verginica Barbu Mititelu, Archna Bhatia, Maja Buljan, Marie Candito, Polona Gantar, Voula Giouli, Tunga Güngör, Abdelati Hawwari, Uxoa Iñurrieta, Jolanta Kovalevskaitė, Simon Krek, Timm Lichte, Chaya Liebeskind, Johanna Monti, Carla Parra Escartín, Behrang <ulink url="http://clip.ipipan.waw.pl/PARSEME-PL/QasemiZadeh#">QasemiZadeh</ulink>, Renata Ramisch, Nathan Schneider, Ivelina Stoyanova, Ashwini Vaidya, Abigail Walsh (2018) <ulink url="https://aclanthology.info/papers/W18-4925/w18-4925">Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions</ulink>, in the Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, USA. </para></listitem><listitem><para>Agata Savary, Marie Candito, Verginica Barbu Mititelu, Eduard Bejček, Fabienne Cap, Slavomír Čéplö, Silvio Ricardo Cordeiro, Gülşen Eryiğit, Voula Giouli, Maarten van Gompel, Yaakov <ulink url="http://clip.ipipan.waw.pl/PARSEME-PL/HaCohen#">HaCohen</ulink>-Kerner, Jolanta Kovalevskaite, Simon Krek, Chaya Liebeskind, Johanna Monti, Carla Parra Escartín, Lonneke van der Plas, Behrang <ulink url="http://clip.ipipan.waw.pl/PARSEME-PL/QasemiZadeh#">QasemiZadeh</ulink>, Carlos Ramisch, Federico Sangati, Ivelina Stoyanova, Veronika Vincze (2018) <ulink url="http://langsci-press.org/catalog/view/204/1344/1319-1">PARSEME multilingual corpus of verbal multiword expressions</ulink>, in Markantonatou, S., Ramisch, C., Savary, A., Vincze, V. (Eds.) <ulink url="http://langsci-press.org/catalog/book/204">Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop</ulink>, Language Science Press, Berlin, pp. 87-147. </para></listitem><listitem><para>Savary, A., Ramisch, C., Ricardo Cordeiro, S., Sangati, F., Vincze, V., <ulink url="http://clip.ipipan.waw.pl/PARSEME-PL/QuasemiZadeh#">QuasemiZadeh</ulink>, B., Candito, M., CAP, F., Giouli, V., Stoyanova, I., Doucet, A. (2017): &quot;<ulink url="http://aclweb.org/anthology/W/W17/W17-1704.pdf">The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions</ulink>&quot;, in the Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), 4 April 2017, Valencia, Spain. (<ulink url="http://aclweb.org/anthology/W/W17/W17-1704.bib">bibtex</ulink>) </para></listitem></itemizedlist><para>The Polish subcorpus contains 27,904 sentences, 638,002 tokens and 5,536 VMWE annotations of 3 categories: </para><itemizedlist><listitem><para>503 verbal idioms (VIDs) e.g. <emphasis>bujać w obłokach</emphasis> (lit. to swing in the clouds) 'to fantasise', <emphasis>kości zostały rzucone</emphasis> (lit dies were cast) ’the die is cast' </para></listitem><listitem><para>2,279 inherently reflexive verbs (IRVs), e.g. <emphasis>śmiać się</emphasis> (lit. to laugh self) 'to laugh', <emphasis>bać się</emphasis> (lit. to fear self) 'to be afraid' </para></listitem><listitem><para>1,833 light verb constructions (LVCs), e.g. <emphasis>odnieść sukces</emphasis> (lit. to carried back a success) ’to be successful', <emphasis>sprawować patronat</emphasis> (lit. to performed patronage) ’to dispense patronage'. </para></listitem></itemizedlist><para>The VMWE annotations are aligned with morphological and syntactic annotations in the <ulink url="http://universaldependencies.org/format.html">CoNLL-U format</ulink>. The morphological data (lemmas, POS, morphological features)) stem from the original corpora. The syntactic data (dependencies) stem partly from the manual annotation in <ulink url="http://zil.ipipan.waw.pl/Sk%C5%82adnica">Składnica</ulink> and partly from automatic annotation with <ulink url="https://ufal.mff.cuni.cz/udpipe">UDPipe</ulink>. </para><section><title>Author</title><itemizedlist><listitem><para><ulink url="http://www.info.univ-tours.fr/~savary/English/indexgb.html">Agata Savary</ulink> </para></listitem></itemizedlist></section><section><title>License</title><para>The VMWE data are distributed under the terms of the <ulink url="https://creativecommons.org/licenses/by/4.0/">CC-BY v4</ulink> license. The lemmas, POS-tags, morphological features, and dependency relations are distributed under the terms of the <ulink url="https://creativecommons.org/licenses/by-sa/4.0/">CC-BY-SA 0.4</ulink> and <ulink url="https://www.gnu.org/licenses/gpl.html">GNU GPL v.3</ulink> licences. </para></section><section><title>Available resources</title><itemizedlist><listitem><para>PARSEME corpus including Polish at LINDAT/CLARIN </para><itemizedlist><listitem><para><ulink url="http://hdl.handle.net/11372/LRT-2842">version 1.1</ulink>  </para></listitem><listitem><para><ulink url="http://hdl.handle.net/11372/LRT-2282">version 1.0</ulink> </para></listitem></itemizedlist></listitem></itemizedlist></section><section><title>Future work</title><itemizedlist><listitem><para>Extending the the annotation to other types of MWEs </para></listitem></itemizedlist></section></section></article>