Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

Locked History Actions

MweLitRead

Corpus of literal readings of Polish verbal multiword expressions

This dataset contains occurrences of verbal multiword expressions (VMWEs) and their literal readings, stemming from the Polish PARSEME corpus. For instance, for the VMWE być w stanie (lit. to be in the state of) 'to be able to' as in:

  • Więcej nie jestem już w stanie dokonać.

the following is a true literal reading:

  • Dwóch rannych żołnierzy jest w stanie krytycznym.

while the following is a false literal reading:

  • Wystarczy tę kwotę przyrównać do płacy minimalnej, aby zrozumieć dlaczego stan czytelnictwa w Polsce jest opłakany.

The VMWE occurrences have been manually annotated. The literal readings have been automatically extracted following several heuristics, and then manually validated as true or false literal readings. The dataset contains:

  • 3149 occurrences of VMWEs
  • 72 true literal readings
  • 344 false literal readings

The dataset allows us to calculate the idiomaticity rate of Polish VMWEs, i.e. the ratio of their occurrences with idiomatic readings to its both idiomatic and literal occurrences in a corpus.

Reference publication:

Authors

License

The data are distributed under the terms of the CC-BY v4 license.

Available resources