<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>Gazetteer</title><revhistory><revision><revnumber>33</revnumber><date>2013-07-26 09:31:02</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>32</revnumber><date>2013-07-25 15:02:33</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>31</revnumber><date>2012-07-20 11:40:39</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>30</revnumber><date>2012-07-19 15:50:49</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>29</revnumber><date>2012-07-11 16:52:10</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>28</revnumber><date>2012-07-11 16:51:35</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>27</revnumber><date>2012-07-11 16:25:11</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>26</revnumber><date>2012-07-11 16:21:28</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>25</revnumber><date>2012-07-11 16:02:07</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>24</revnumber><date>2012-07-11 16:01:42</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>23</revnumber><date>2012-07-11 16:01:13</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>22</revnumber><date>2012-07-11 15:49:17</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>21</revnumber><date>2012-07-11 15:48:18</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>20</revnumber><date>2012-07-11 15:33:39</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>19</revnumber><date>2012-07-11 15:26:44</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>18</revnumber><date>2012-07-11 15:05:08</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>17</revnumber><date>2012-07-11 14:31:49</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>16</revnumber><date>2012-07-11 14:31:16</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>15</revnumber><date>2012-07-11 14:24:45</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>14</revnumber><date>2012-07-11 14:24:24</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>13</revnumber><date>2012-07-11 14:22:17</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>12</revnumber><date>2012-07-11 14:21:51</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>11</revnumber><date>2012-07-11 14:20:45</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>10</revnumber><date>2011-11-07 13:27:08</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>9</revnumber><date>2011-11-07 13:26:40</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>8</revnumber><date>2011-11-03 14:37:14</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>7</revnumber><date>2011-11-03 14:29:12</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>6</revnumber><date>2011-09-29 09:48:21</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>5</revnumber><date>2011-09-29 09:33:16</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>4</revnumber><date>2011-09-29 09:32:59</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>3</revnumber><date>2011-09-29 09:31:27</date><authorinitials>AgataSavary</authorinitials></revision><revision><revnumber>2</revnumber><date>2011-09-29 00:26:08</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>1</revnumber><date>2011-09-29 00:23:57</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision></revhistory></articleinfo><section><title>Gazetteer for Polish Named Entities</title><para>The Gazetteer for Polish Named Entities was used within the <emphasis><ulink url="http://sprout.dfki.de/">SProUT</ulink></emphasis> platform, initially for information extraction from Polish texts, and then for the automatic pre-annotation of the <emphasis><ulink url="http://nkjp.pl/index.php?page=0&amp;lang=1">National Corpus of Polish</ulink></emphasis> (NKJP) on the level of named entities. Its construction, contents and use have been described in: </para><itemizedlist><listitem><para>SAVARY, A., PISKORSKI, J. (2011). <emphasis>Language Resources for Named Entity Annotation in the National Corpus of Polish</emphasis>, in Control and Cybernetics 40(2), Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland, pp. 361-391. </para></listitem><listitem><para>SAVARY, A., PISKORSKI, J. (2010). <emphasis><ulink url="http://iis.ipipan.waw.pl/2010/proceedings/iis10-15.pdf">Lexicons and Grammars for Named Entity Annotation in the National Corpus of Polish</ulink></emphasis>, in Proceedings of the 18th International Conference Intelligent Information Systems (IIS'10), Siedlce, Poland. </para></listitem><listitem><para>PISKORSKI, J. (2005). <emphasis>Named-Entity Recognition for Polish with SProUT</emphasis>, in LNCS Vol 3490: Proceedings of IMTCI 2004, Warsaw, Poland. </para></listitem></itemizedlist><para>The gazetteer contains 153,477 inflected entries of Polish (and some foreign) proper names and named entity components: </para><itemizedlist><listitem><para>forenames and surnames, </para></listitem><listitem><para>city, country, mountain, region and river names, </para></listitem><listitem><para>institution names, </para></listitem><listitem><para>relational adjectives and inhabitant names stemming from country names, </para></listitem><listitem><para>named entity triggers (months, days, positions, etc.). </para></listitem></itemizedlist><para>The file DOES NOT contain inhabitant names and relational adjectives stemming from Polish settlements. These data, owned by the PWN publisher, were used within the NKJP project under a particular licence and are concerned by the copyright. </para><section><title>Authors</title><itemizedlist><listitem><para><ulink url="http://www.info.univ-tours.fr/~savary/English/indexgb.html">Agata Savary</ulink> - NKJP version version of the gazetteer; LMF format definition </para></listitem><listitem><para><ulink url="http://zil.ipipan.waw.pl/MichalLenart">Michał Lenart</ulink> - LMF conversion and validation </para></listitem><listitem><para><ulink url="http://zil.ipipan.waw.pl/JakubPiskorski">Jakub Piskorski</ulink> - earlier version of the gazetteer used for information extraction from Polish texts </para></listitem></itemizedlist></section><section><title>License</title><para>The data are available under the <ulink url="http://en.wikipedia.org/wiki/BSD_licenses#2-clause_license_.28.22Simplified_BSD_License.22_or_.22FreeBSD_License.22.29">2-clause BSD licence</ulink>. </para></section><section><title>Available resources</title><itemizedlist><listitem><para><ulink url="http://clip.ipipan.waw.pl/Gazetteer/Gazetteer?action=AttachFile&amp;do=get&amp;target=gazetteer-nkjp-no-pwn.zip">Text version</ulink> as used with Sprout for NKJP pre-annotation </para></listitem><listitem><para><ulink url="http://clip.ipipan.waw.pl/Gazetteer/Gazetteer?action=AttachFile&amp;do=get&amp;target=polish-ne-gazetteer-LMF-format.pdf">LMF format definition and conversion guidelines</ulink> </para></listitem><listitem><para><ulink url="http://clip.ipipan.waw.pl/Gazetteer/Gazetteer?action=AttachFile&amp;do=get&amp;target=PNEG-LMF-v1.tar.gz">LMF-compliant version</ulink> containing: </para><itemizedlist><listitem><para>LMF format definition and conversion guidelines, </para></listitem><listitem><para>Relax NG schema, morphosyntax configuration file and validation scrypts, </para></listitem><listitem><para>gramatically complete gazetteer entries (9,060 lemmas and 95,359 word forms), </para></listitem><listitem><para>gramatically incomplete gazetteer entries (35,884 lemmas and 40,612 word forms). </para></listitem></itemizedlist></listitem></itemizedlist></section></section></article>