= Corpus of literal readings of Polish verbal multiword expressions = This dataset contains occurrences of verbal multiword expressions (VMWEs) and their literal readings, stemming from the [[http://clip.ipipan.waw.pl/PARSEME-PL|Polish PARSEME corpus]]. For instance, for the VMWE ''być w stanie'' (lit. to be in the state of) 'to be able to' as in: * ''Więcej nie '''jestem''' już '''w stanie''' dokonać.'' the following is a true literal reading: * ''Dwóch rannych żołnierzy __jest w stanie__ krytycznym.'' while the following is a false literal reading: * ''Wystarczy tę kwotę przyrównać do płacy minimalnej, aby zrozumieć dlaczego __stan__ czytelnictwa __w__ Polsce __jest__ opłakany.'' The VMWE occurrences have been manually annotated. The literal readings have been automatically extracted following several heuristics, and then manually validated as true or false literal readings. The dataset contains: * 3149 occurrences of VMWEs * 72 true literal readings * 344 false literal readings The dataset allows us to calculate the idiomaticity rate of Polish VMWEs, i.e. the ratio of their occurrences with idiomatic readings to its both idiomatic and literal occurrences in a corpus. Reference publication: * Agata Savary and Silvio Ricardo Cordeiro (2018) [[http://aclweb.org/anthology/W/W17/W17-7610.pdf|Literal readings of multiword expressions: as scarce as hen's teeth]], in the Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories ([[https://ufal.mff.cuni.cz/tlt16/|TLT-16]]), Prague, Czech Republic. == Authors == * [[http://www.info.univ-tours.fr/~savary/English/indexgb.html|Agata Savary]] * [[http://www.lif.univ-mrs.fr/annuaire/personne/1274507|Silvio Ricardo Cordeiro]] == License == The data are distributed under the terms of the [[https://creativecommons.org/licenses/by/4.0/|CC-BY v4]] license. == Available resources == * a [[attachment:README.tst|README]] file * true and false [[attachment:literal-matches-tagged.tsv|literal readings]] of VMWEs * all [[attachment:mwes-and-literal-matches.tsv|idiomatic and literal readings]]