= DeepER Entity Library = DeepER Entity Library is a database containing around 900,000 entities, each described by its textual representations in Polish (names) and `WordNet` synsets. This resource has been originally created for deep entity recognition (DeepER) in Polish Question Answering System RAFAEL by analysing definitions in Polish Wikipedia [1,2]. A simplified version is also available, containing nominal groups instead of synsets. '''Download main library''': [[attachment:entities.txt.gz]] '''Download simplified library''': [[attachment:entitiesD.txt.gz]] Created and made available by Piotr Przybyła. == Main library == The library contains 809,786 entities with 1,169,452 names (972,592 unique) and 1,264,918 synsets (31,545 unique). Each of them consists of the following elements (entity #9751, describing Bronisław Komorowski): * Main name: Bronisław Komorowski, * Other names (aliases): Bronisław Maria Komorowski, Komorowski, * Description URL: http://pl.wikipedia.org/wiki/?curid=121267, * plWordNet 2.1 [3] synsets: * (vice-minister, undersecretary), * (vice-speaker of the Sejm, the Polish parliament), * (politician), * (member of a parliament), * (speaker of the Sejm), * (historian), * (minister), * (president of a city, mayor). Each line of the file corresponds to a single entity and has the following format: {{{ ......... }}} where: * synset_id corresponds to synset id in plWordNet 2.1, * synset_repr is a human-readable representation of a synset. == Simplified library == The simplified version of the library instead of `WordNet` synsets contains nominal groups, from which they have been extracted. For example, the list for Bronisław Komorowski is the following: * wicemarszałek i marszałek Sejmu * minister obrony narodowej * wiceminister i minister obrony narodowej * marszałek Sejmu RP * polski polityk * poseł * prezydent RP * historyk Therefore, each line has the following format: {{{ ...... }}} == References == [1] Przybyła, P. (2015). Gathering Knowledge for Question Answering Beyond Named Entities. Proceedings of the 20th International Conference on Application of Natural Language to Information Systems (NLDB 2015). [2] Przybyła, P. (2014). Odpowiadanie na pytania w języku polskim z użyciem głębokiego rozpoznawania nazw. Doctoral thesis, Institute of Computer Science, Polish Academy of Sciences. [3] Maziarz, M., Piasecki, M., and Szpakowicz, S. (2012). Approaching plWordNet 2.0. Proceedings of the 6th Global Wordnet Conference.