Attachment 'README.tst'
Download 1 This is the README file for the Polish dataset of literal readings of Polish verbal multiword expressions (VMWEs).
2
3 -----------
4 Description
5
6 This dataset contains occurrences of verbal multiword expressions (VMWEs) and their literal readings, stemming from the Polish subcorpus of the [PARSEME VMWE corpus v. 1.0](http://hdl.handle.net/11372/LRT-2282). The VMWE occurrences have been manually annotated. The literal readings have been automatically extracted following several heuristics, and then manually validated.
7
8 -----------
9 Files
10
11 --
12 mwes-and-literal-matches.tsv
13
14 This file contains all VMWE occurrences and their corresponding literal matches. These are automatically extracted, not manually validated data.
15
16 Fields:
17 MWE The canonical form of the VMWE found in the training corpus; it consists of the lemmas of the components of the given VMWE
18 POS-tag The sequence of POS tags of the VMWE components
19 category The category of the VMWE (ID, IReflV, LVC or OTH)
20 idiomatic-or-literal IDIOMATIC if the occurrence was manually annotated as a VMWE in the PARSEME corpus;
21 LITERAL if the occurrence was extracted by one of the heuristics described in (Savary & Cordeiro, 2018)
22 annotation-methods The heuristics used to extract the literal match (cf. (Savary & Cordeiro 2018)), if the same match was found by several heuristics, all of them are listed
23 sentence-with-mweoccur The sentence containing the VMWE or the literal match
24 source The sentence identifier in the PARSEME corpus
25
26 --
27 literal-matches-tagged.tsv
28
29 This file contains the manually validated subset of the previous file, namely of those lines where idiomatic-or-literal=LITERAL. For each line, the literal match (i.e. candidate for a literal readings) was manually validated as a true or false literal reading.
30
31 Fields:
32 MWE The canonical form of the VMWE found in the training corpus; it consists of the lemmas of the components of the given VMWE
33 POS-tag The sequence of POS tags of the VMWE components
34 category The category of the VMWE (ID, IReflV or LVC)
35 annotation-methods The heuristics used to extract the literal match (cf. (Savary & Cordeiro 2018)), if the same match was found by several heuristics, all of them are listed
36 Interpretation 1 TRUE or FALSE literal reading
37 Interpretation 2 The reason for the false literal readingin (free text, e.g. "dependencies unchecked")
38 Interpretation 3 The constraints imposed by the VMWE and not respected by the literal reading, or type of error due to which the false literal reading was found
39 Comment Details about the interpretations (free text)
40 sentence-with-mweoccur The sentence containing the literal match
41 source The sentence identifier in the PARSEME corpus
42
43 -----------
44 Bibliographic reference:
45
46 Agata Savary and Silvio Ricardo Cordeiro (2018) "Literal readings of multiword expressions: as scarce as hen's teeth", in the Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT-16), Prague, Czech Republic.
47
48 -----------
49 License
50 The data are distributed under the terms of the [[https://creativecommons.org/licenses/by/4.0/|CC-BY v4]] license.
51
52 -----------
53 Authors:
54
55 Agata Savary (agata.savary@univ-touts.fr)
56 Silvio Ricardo Cordeiro (silvio.cordeiro@lif.univ-mrs.fr)
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.