Size: 430
Comment:
|
Size: 1288
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
The PARSEME corpus is a multilingual corpus annotated manually for verbal multiword expressions (VMWEs) in '''18 languages''' including Polish. If was used in the [[http://multiword.sourceforge.net/sharedtask2017/|PARSEME shared task]] on automatic identification of verbal multiword expressions. It was created due to a collective effort of the IC1207 COST action [[http://www.parseme.eu|PARSEME]]. | The PARSEME corpus is a multilingual corpus annotated manually for verbal multiword expressions (VMWEs) in '''18 languages''' including Polish. If was used in the [[http://multiword.sourceforge.net/sharedtask2017/|PARSEME shared task]] on automatic identification of verbal multiword expressions. It was created due to a collective effort of the IC1207 COST action [[http://www.parseme.eu|PARSEME]]. It contains in total 274,376 sentences, 5,439,204 tokens and 62,218 annotated VMWEs in 18 languages. It is openly available via [[http://hdl.handle.net/11372/LRT-2282|LINDAT/CLARIN]] under various flavours of the Creative Common licence. The Polish subcoprus contains 13,606 sentences, 220,934 tokens and 3,649 VMWE annotations of 3 types * 383 idioms (IDs) e.g. ''bujać w obłokach (lit. to swing in the clouds) 'to fantasise', ''kości zostały rzucone'' (lit .dies were cast) ’the die is cast' * 1,813 inherently reflexive verbs (IReflVs), e.g. ''śmiać się'' (lit. to laugh self) 'to laugh', ''bać się'' (lit. to fear self) 'to be afraid' * 1,453 light verb constructions (LVCs), e.g. odnieść sukces (lit. to carried back a success) ’to be successful', ''sprawować patronat'' (lit. to performed patronage) ’to dispense patronage'. |
Polish PARSEME corpus
The PARSEME corpus is a multilingual corpus annotated manually for verbal multiword expressions (VMWEs) in 18 languages including Polish. If was used in the PARSEME shared task on automatic identification of verbal multiword expressions. It was created due to a collective effort of the IC1207 COST action PARSEME. It contains in total 274,376 sentences, 5,439,204 tokens and 62,218 annotated VMWEs in 18 languages. It is openly available via LINDAT/CLARIN under various flavours of the Creative Common licence.
The Polish subcoprus contains 13,606 sentences, 220,934 tokens and 3,649 VMWE annotations of 3 types
383 idioms (IDs) e.g. bujać w obłokach (lit. to swing in the clouds) 'to fantasise', kości zostały rzucone (lit .dies were cast) ’the die is cast'
1,813 inherently reflexive verbs (IReflVs), e.g. śmiać się (lit. to laugh self) 'to laugh', bać się (lit. to fear self) 'to be afraid'
1,453 light verb constructions (LVCs), e.g. odnieść sukces (lit. to carried back a success) ’to be successful', sprawować patronat (lit. to performed patronage) ’to dispense patronage'.