Gazetteer for Polish Named Entities

The Gazetteer for Polish Named Entities was used within the SProUT platform, initially for information extraction from Polish texts, and then for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. Its construction, contents and use have been described in:

The gazetteer contains 153,477 inflected entries of Polish (and some foreign) proper names and named entity components:

The file DOES NOT contain inhabitant names and relational adjectives stemming from Polish settlements. These data, owned by the PWN publisher, were used within the NKJP project under a particular licence and are concerned by the copyright.

Authors

License

The data are available under the 2-clause BSD licence.

Available resources