The Linguistic Engineering Group
The Linguistic Engineering (LE) Group is part of the Department of Artificial Intelligence at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.english.pan.pl/|Polish Academy of Sciences]] (ICS PAS).
People
Anna Andrzejczuk, MSc |
|
Leonard Bolc, PhD (Professor Emeritus) |
|
Łukasz Degórski, MSc |
|
Elżbieta Hajnicz, PhD |
|
Łukasz Kobyliński, MSc |
|
Anna Kupść, PhD (on leave) |
|
Małgorzata Marciniak, PhD |
|
Marcin Miłkowski, PhD (part time) |
|
Agnieszka Mykowiecka, PhD |
|
Maciej Ogrodniczuk, PhD |
|
Jakub Piskorski, PhD (Associate) |
|
Adam Przepiórkowski, PhD, Head of the Group |
|
Piotr Rychlik, PhD |
|
Tomek Strzałkowski, PhD, Foreign Associate |
|
Łukasz Szałkiewicz, MSc |
|
Stan Szpakowicz, PhD, Foreign Associate |
|
Aleksander Wawer, MSc |
|
Marcin Woliński, PhD |
|
Alina Wróblewska, MSc |
Research
The main research areas of the Group
(Polish) corpus linguistics; cf. the IPI PAN Corpus of Polish and the National Corpus of Polish,
syntactic and semantic parsing of Polish; cf. Spejd and Świgra,
- extraction of linguistic knowledge from corpora,
- information extraction,
- sentiment analysis,
- morphosyntactic system of Polish,
- generative linguistic formalisms, esp., HPSG and LFG.
The Group is a member of CLARIN, FLaReNet and META-NET.
Current externally funded projects
CESAR (CEntral and South-east europeAn Resources; part of META-NET) ‒ a European (CIP ICT-PSP) project (grant agreement 271022), 1 February 2011 ‒ 31 January 2013. Polish PI: Adam Przepiórkowski.
SYNAT (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society) – a National Centre for Research and Development grant, 16 August 2010 – 16 August 2013. Polish title: Utworzenie uniwersalnej, otwartej, repozytoryjnej platformy hostingowej i komunikacyjnej dla sieciowych zasobów wiedzy dla nauki, edukacji i otwartego społeczeństwa wiedzy. PI: Beata Konikowska.
Nekst(An adaptive system to support problem-solving on the basis of document collections in the Internet) ‒ a national Ministry of Science and Higher Education Innovative Economy Operational Programme (PO IG) grant, 1 January 2010 ‒ 31 December 2013. Polish title: Adaptacyjny system wspomagający rozwiązywanie problemów w oparciu o analizę treści dostępnych źródeł elektronicznych. PI: Jacek Koronacki.
ATLAS (Applied Technology for Language-Aided CMS) ‒ a European (CIP ICT-PSP) project (grant agreement 250467), 1 March 2010 ‒ 28 February 2013. Polish PI: Adam Przepiórkowski.
Construction of a treebank for Polish using automatic syntactic analysis ‒ a national Ministry of Science and Higher Education research grant (number N N104 224735), 14 October 2008 ‒ 13 April 2011. Polish title: Budowa banku drzew składniowych dla języka polskiego z wykorzystaniem automatycznej analizy składniowej. PI: Marcin Woliński.
CLARIN (Common Language Resources and Technology Infrastructure) ‒ a European (ESFRI) infrastructure project, FP7 (contract number 212230), 1 January 2008 ‒ 31 December 2010 (plus 6 months extension). PI at ICS PAS: Adam Przepiórkowski.
NKJP (National Corpus of Polish) ‒ a national Ministry of Science and Higher Education research/development grant (number R17 003 03), 13 December 2007 ‒ 12 December 2010 (plus 6 months extension). Polish title: Narodowy Korpus Języka Polskiego. PI: Adam Przepiórkowski.
Some of our past projects
Automatic detection of semantic dependencies within verb argument structures in large treebanks ‒ a national Ministry of Science and Higher Education habilitation grant (number N N516 0165 33), 2 November 2007 ‒ 1 November 2009. Polish title: Automatyczne wykrywanie zależności semantycznych w strukturze argumentowej czasowników w dużych korpusach tekstów anotowanych syntaktycznie. PI: Elżbieta Hajnicz.
LUNA (spoken Language UNderstanding in multilinguAl communication systems) ‒ a European ( IST) Specific Targeted Research Project (contract number 033549), 4 September 2006 ‒ 3 September 2009. Polish PI: Agnieszka Mykowiecka.
Spoken language understanding in multilingual communication systems ‒ a Ministry of Science and Higher Education support for the Polish participation in the LUNA project, 1 March 2008 ‒ 1 September 2009. Polish title: Rozumienie mowy w wielojęzycznych systemach komunikacji. PI: Małgorzata Marciniak.
LT4eL (Language Technology for eLearning) ‒ a European ( IST) Specific Targeted Research Project (contract number 027391), 1 December 2005 ‒ 31 May 2008. Polish PI: Adam Przepiórkowski.
Automatic extraction of linguistic knowledge from a large corpus of Polish ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00328), 9 March 2005 ‒ 8 March 2008. Polish title: Automatyczna ekstrakcja wiedzy lingwistycznej z dużego korpusu języka polskiego. PI: Adam Przepiórkowski. The first publicly available tagger of Polish, TaKIPI has originally been developed within this project.
Information Extraction from Polish free text ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00727), 20 October 2004 ‒ 19 October 2007. Polish title: Opracowanie narzędzi do ekstrakcji informacji z tekstów w języku polskim. PI: Agnieszka Mykowiecka.
The IPI PAN Corpus of Polish ‒ a national KBN grant (7T11C04320), 1 April 2001 ‒ 31 March 2004. Polish title: Anotowany korpus pisanego języka polskiego z dostępem przez internet (z uwzględnieniem zastosowań w inżynierii lingwistycznej). PI: Adam Przepiórkowski.
A Treebank / Test-Suite of Polish Utterances ‒ a EU CRIT-2 subproject (ICS-MM), 15 October 1997 ‒ 14 October 2000. Coordinator: Leonard Bolc.
An HPSG Grammar of Polish (theory and implementation) ‒ a national KBN grant (8T11C01110), 1 January 1996 ‒ 31 December 1998. Polish title: Zastosowanie metod inżynierii lingwistycznej do automatycznej analizy i syntezy tekstów języka polskiego. PI: Leonard Bolc.
Publicly available tools and resources
Here are some of the tools and resources created within our projects.
Tools (all open source, under GPL):
Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
Poliqarp – a corpus indexing and search engine,
Dendrarium – a treebank development system (under development),
Anotatornia – a system for multi-level manual annotation of corpora (forthcoming),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (forthcoming),
Resources:
National Corpus of Polish (under development).
Other activities
Links to some other activities of the Group:
Intelligent Information Systems series of conferences.