Locked History Actions

Information Extraction from Polish free text

Information Extraction from Polish free text

Project factsheet

Polish name:

Opracowanie narzędzi do ekstrakcji informacji z tekstów w języku polskim

Project type:

A national Ministry of Science and Higher Education research grant (number 3T11C00727)


20 October 2004 ‒ 19 October 2007

Principal investigator:

Agnieszka Mykowiecka


Institute of Computer Science, Polish Academy of Sciences

Project description


  • not many efforts on IE on Polish texts in contrast to many existing applications for many languages,
  • existing IE tools could not be directly used for processing Polish.


  • adapting chosen IE tools for processing Polish,
  • collecting some linguistic resources for IE.


  • adapting IE platforms SProUT and (recently) GATE for tokenization and morphological analysis of Polish texts,
  • collecting resources and IE grammars for named entities recognition (NER) in Polish texts,
  • ruled based IE experiments in a selected domain (medical texts),
  • testing methods of terminology extraction on Polish data.