CLIP
Gazetteer

Gazetteer for Polish Named Entities

The Gazetteer for Polish Named Entities was used within the SProUT platform, initially for information extraction from Polish texts, and then for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. Its construction, contents and use have been described in:

The gazetteer contains 153,477 inflected entries of Polish (and some foreign) proper names and named entity components:

The file DOES NOT contain inhabitant names and relational adjectives stemming from Polish settlements. These data, owned by the PWN publisher, were used within the NKJP project under a particular licence and are concerned by the copyright.

Authors

License

The data are available under the 2-clause BSD licence.

Available resources

last edited 2013-07-26 09:31:02 by AgataSavary