CLIP
DeepEREntityLibrary

DeepER Entity Library

DeepER Entity Library is a database containing around 900,000 entities, each described by its textual representations in Polish (names) and WordNet synsets. This resource has been originally created for deep entity recognition (DeepER) in Polish Question Answering System RAFAEL by analysing definitions in Polish Wikipedia [1,2]. A simplified version is also available, containing nominal groups instead of synsets.

Download main library: entities.txt.gz

Download simplified library: entitiesD.txt.gz

Created and made available by Piotr Przybyła.

Main library

The library contains 809,786 entities with 1,169,452 names (972,592 unique) and 1,264,918 synsets (31,545 unique). Each of them consists of the following elements (entity #9751, describing Bronisław Komorowski):

Each line of the file corresponds to a single entity and has the following format:

<main_name><tab><article_name><tab><URL><tab><names_number(n)><tab><synsets_number(m)><tab><name_1><tab><name_2>...<tab><name_n><tab><synset_id_1><tab><synset_id_2>...<synset_id_m><tab><synset_repr_1><tab><synset_repr_2>...<synset_repr_m>

where:

Simplified library

The simplified version of the library instead of WordNet synsets contains nominal groups, from which they have been extracted. For example, the list for Bronisław Komorowski is the following:

Therefore, each line has the following format:

<main_name><tab><article_name><tab><URL><tab><names_number(n)><tab><groups_number(m)><tab><name_1><tab><name_2>...<tab><name_n><tab><group_1><tab><group_2>...<group_m>

References

[1] Przybyła, P. (2015). Gathering Knowledge for Question Answering Beyond Named Entities. Proceedings of the 20th International Conference on Application of Natural Language to Information Systems (NLDB 2015).

[2] Przybyła, P. (2014). Odpowiadanie na pytania w języku polskim z użyciem głębokiego rozpoznawania nazw. Doctoral thesis, Institute of Computer Science, Polish Academy of Sciences.

[3] Maziarz, M., Piasecki, M., and Szpakowicz, S. (2012). Approaching plWordNet 2.0. Proceedings of the 6th Global Wordnet Conference.

last edited 2015-04-13 11:08:27 by MateuszKopec