Locked History Actions

SpejdLemmatizingGrammar

Lemmatisation of Polish nominal syntactic groups (Spejd grammar)

This is a variant of a NKJP grammar for Spejd. The grammar has been created by Łukasz Degórski [1]; some corrections and improvements have been added by Piotr Przybyła while adapting it for entity recognition in the Polish question answering system RAFAEL [2]. As NKJP grammar, it is available on GNU GPL v.3.

Download grammar rules: rules.sr

Using this grammar, Spejd is able to output information about lemmas (base forms) of most of types of nominal syntactic groups. For example, group description may look like this (notice the "base" attribute):

<seg xml:id="groups_2.6-s_31"><!--  rule="NGg: Noun + n-Noun (gen)" -->
 <fs type="group">
  <f name="orth">
   <string>zapisu dźwięków</string>
  </f>
  <f name="type">
   <symbol value="NGg"/>
  </f>
  <f name="base">
   <string>zapis dźwięków</string>
  </f>
 </fs>
 <ptr type="head" target="ann_words.xml#words_2.6-s_75"/>
 <ptr type="nonhead" target="ann_words.xml#words_2.6-s_76"/>
</seg>

As Spejd is unable to generate new word forms, some lemmas need to be expressed in a special way. For example see:

<seg xml:id="groups_2.5-s_28"><!--  rule="NGa: Noun + Adj" -->
 <fs type="group">
  <f name="orth">
   <string>pochodzenia indyjskiego</string>
  </f>
  <f name="type">
   <symbol value="NGa"/>
  </f>
  <f name="base">
   <string>pochodzenie ADJ(indyjski,n,pos)</string>
  </f>
 </fs>
 <ptr type="head" target="ann_words.xml#words_2.5-s_67"/>
 <ptr type="nonhead" target="ann_words.xml#words_2.5-s_68"/>
</seg>

Here ADJ(indyski,n.pos) should be replaced by the adjective "indyjski" in neuter gender and positive grade, i.e. "indyjskie", to create full lemma, i.e. "indyjskie pochodzenie". Special expressions are the following:

  • ADJ(lemma, gender, grade) for adjective,
  • GER(lemma, negation) for gerunds,
  • PPAS(lemma, gender, negation) for passive participles,
  • PACT(lemma, gender, negation) for active participles.

References

[1] Degórski, Ł. (2012). Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar. Proceedings of the International Joint Conference on Security and Intelligent Information Systems (S&IIS 2011).

[2] Przybyła, P. (2015). Gathering Knowledge for Question Answering Beyond Named Entities. Proceedings of the 20th International Conference on Application of Natural Language to Information Systems (NLDB 2015).