Size: 3519
Comment:
|
Size: 3556
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from Benchmarks |
Benchmarks
This page documents performance of various NLP systems for Polish.
Morphological analysis
- : Morfeusz, Concraft/WCRFT, Spejd, Dependency Parser, TIMEX/Nerf,
POS tagging
Shallow parsing
Dependency parsing
Deep parsing
Word sense disambiguation
Named entity recognition
Sentiment analysis
Mention detection
Precision, recall and F-measure are calculated on Polish Coreference Corpus data with two alternative mention detection scores:
- EXACT: score of exact boundary matches (an automatic and a manual mention match if they have exactly the same boundaries; in other words, they consist of the same tokens)
- HEAD: score of head matches (we reduce the automatic and the manual mentions to their single head tokens and compare them).
System name |
Short description |
Main publication |
License |
EXACT |
HEAD |
||||
P |
R |
F |
P |
R |
F |
||||
Collects mention candidates from available sources – morphosyntactical, shallow parsing, named entity and/or zero anaphora detection tools |
Ogrodniczuk M., Głowińska K., Kopeć M., Savary A., Zawisławska M. Coreference in Polish: Annotation, Resolution and Evaluation, chapter 10.6. Walter De Gruyter, 2015. |
CC BY 3 |
66.79% |
67.21% |
67.00% |
88.29% |
89.41% |
88.85% |
Coreference resolution
As there is still no consensus about the single best coreference resolution metrics, CoNLL measure is used (average of MUC, B3 and CEAFE F-measure values). For end-to-end systems CoNLL-2011 shared task-based approach is used, so two result calculation strategies are presented:
- INTERSECT: consider only correct system mentions (i.e. the intersection between gold and system mentions)
- TRANSFORM: unify system and gold mention sets using the following procedure for twinless mentions (without a corresponding mention in the second set):
- insert twinless gold mentions into system mention set as singletons
- remove twinless singleton system mentions
- insert twinless non-singletion system mentions into gold set as singletons.
The results are produced on Polish Coreference Corpus data.
System name |
Short description |
Main publication |
License |
GOLD |
EXACT INTERSECT |
EXACT TRANSFORM |
HEAD INTERSECT |
HEAD TRANSFORM |
Rule-based |
Ogrodniczuk M., Kopeć M. End-to-end coreference resolution baseline system for Polish. In Z. Vetulani (ed.), Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 167–171, Poznań, Poland, 2011. |
CC BY 3 |
73.40% |
78.54% |
66.55% |
76.27% |
70.11% |
|
Statistical |
Kopeć M., Ogrodniczuk M. Creating a Coreference Resolution System for Polish. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 192–195, ELRA. |
CC BY 3 |
78.41% |
80.86% |
68.96% |
78.58% |
72.15% |