Locked History Actions

Diff for "Gazetteer"

Differences between revisions 1 and 5 (spanning 4 versions)
Revision 1 as of 2011-09-29 00:23:57
Size: 1549
Comment:
Revision 5 as of 2011-09-29 09:33:16
Size: 1607
Editor: AgataSavary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Polish Gazetteer = = The Polish Gazetteer =
Line 3: Line 3:
The Polish Gazetteer is the textual source used within the SProUT (http://sprout.dfki.de/) platform for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. Its construction, contents and use have been described in: The Polish Gazetteer is the textual source used within the ''[[http://sprout.dfki.de/|SProUT]]'' platform for the automatic pre-annotation of the ''[[http://nkjp.pl/index.php?page=0&lang=1 |National Corpus of Polish]]'' (NKJP) on the level of named entities. Its construction, contents and use have been described in:

The Polish Gazetteer

The Polish Gazetteer is the textual source used within the SProUT platform for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. Its construction, contents and use have been described in:

The file contains 153,477 inflected entries of Polish (and some foreign) proper names and named entity components:

  • forenames and surnames
  • city, country, mountain, region and river names
  • institution names
  • named entity triggers (months, days, positions, etc.)

The file DOES NOT contain inhabitant names and relational adjectives stemming from Polish settlements. These data, owned by the PWN publisher, were used within the NKJP project under a particular licence and are concerned by the copyright.

The data is available under 2-clause BSD licence.

Available resources

gazetteer-nkjp-no-pwn.zip