attachment:README.tst of MweLitRead

Attachment 'README.tst'

   1 This is the README file for the Polish dataset of literal readings of Polish verbal multiword expressions (VMWEs).
   2 
   3 -----------
   4 Description
   5 
   6 This dataset contains occurrences of verbal multiword expressions (VMWEs) and their literal readings, stemming from the Polish subcorpus of the [PARSEME VMWE corpus v. 1.0](http://hdl.handle.net/11372/LRT-2282). The VMWE occurrences have been manually annotated. The literal readings have been automatically extracted following several heuristics, and then manually validated. 
   7 
   8 -----------
   9 Files
  10 
  11 --
  12 mwes-and-literal-matches.tsv
  13 
  14 This file contains all VMWE occurrences and their corresponding literal matches. These are automatically extracted, not manually validated data.
  15 
  16 Fields:
  17 MWE			The canonical form of the VMWE found in the training corpus; it consists of the lemmas of the components of the given VMWE
  18 POS-tag			The sequence of POS tags of the VMWE components
  19 category		The category of the VMWE (ID, IReflV, LVC or OTH)
  20 idiomatic-or-literal	IDIOMATIC if the occurrence was manually annotated as a VMWE in the PARSEME corpus; 
  21 			LITERAL if the occurrence was extracted by one of the heuristics described in (Savary & Cordeiro, 2018)
  22 annotation-methods	The heuristics used to extract the literal match (cf. (Savary & Cordeiro 2018)), if the same match was found by several heuristics, all of them are listed
  23 sentence-with-mweoccur	The sentence containing the VMWE or the literal match
  24 source			The sentence identifier in the PARSEME corpus
  25 
  26 --
  27 literal-matches-tagged.tsv
  28 
  29 This file contains the manually validated subset of the previous file, namely of those lines where idiomatic-or-literal=LITERAL. For each line, the literal match (i.e. candidate for a literal readings) was manually validated as a true or false literal reading.
  30 
  31 Fields:
  32 MWE			The canonical form of the VMWE found in the training corpus; it consists of the lemmas of the components of the given VMWE
  33 POS-tag			The sequence of POS tags of the VMWE components
  34 category		The category of the VMWE (ID, IReflV or LVC)
  35 annotation-methods	The heuristics used to extract the literal match (cf. (Savary & Cordeiro 2018)), if the same match was found by several heuristics, all of them are listed
  36 Interpretation 1	TRUE or FALSE literal reading
  37 Interpretation 2	The reason for the false literal readingin (free text, e.g. "dependencies unchecked")
  38 Interpretation 3	The constraints imposed by the VMWE and not respected by the literal reading, or type of error due to which the false literal reading was found
  39 Comment			Details about the interpretations (free text)
  40 sentence-with-mweoccur	The sentence containing the literal match
  41 source			The sentence identifier in the PARSEME corpus
  42 
  43 -----------
  44 Bibliographic reference:
  45 
  46 Agata Savary and Silvio Ricardo Cordeiro (2018) "Literal readings of multiword expressions: as scarce as hen's teeth", in the Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT-16), Prague, Czech Republic.
  47 
  48 -----------
  49 License
  50 The data are distributed under the terms of the [[https://creativecommons.org/licenses/by/4.0/|CC-BY v4]] license.
  51 
  52 -----------
  53 Authors:
  54 
  55 Agata Savary (agata.savary@univ-touts.fr)
  56 Silvio Ricardo Cordeiro (silvio.cordeiro@lif.univ-mrs.fr)

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.

You are not allowed to attach a file to this page.

Menu

Wiki

Attachment 'README.tst'

Attached Files