The Linguistic Engineering Group

The Linguistic Engineering (LE) Group is part of the Department of Artificial Intelligence at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.english.pan.pl/|Polish Academy of Sciences]] (ICS PAS).

People

Anna Andrzejczuk, MSc	anna.andrzejczuk@ipipan.waw.pl
Leonard Bolc, PhD (Professor Emeritus)	leonard.bolc@ipipan.waw.pl
Łukasz Degórski, MSc	ldegorski@bach.ipipan.waw.pl
Elżbieta Hajnicz, PhD	elzbieta.hajnicz@ipipan.waw.pl
Łukasz Kobyliński, MSc	lkobylinski@ipipan.waw.pl
Anna Kupść, PhD (on leave)	anna.kupsc@ipipan.waw.pl
Małgorzata Marciniak, PhD	malgorzata.marciniak@ipipan.waw.pl
Marcin Miłkowski, PhD (part time)	marcin.milkowski@ifispan.waw.pl
Agnieszka Mykowiecka, PhD	agnieszka.mykowiecka@ipipan.waw.pl
Maciej Ogrodniczuk, PhD	maciej.ogrodniczuk@ipipan.waw.pl
Jakub Piskorski, PhD (Associate)	jakub.piskorski@ipipan.waw.pl
Adam Przepiórkowski, PhD, Head of the Group	adam.przepiorkowski@ipipan.waw.pl
Piotr Rychlik, PhD	rychlik@ipipan.waw.pl
Tomek Strzałkowski, PhD, Foreign Associate	tomek@cs.albany.edu
Łukasz Szałkiewicz, MSc	lukasz.szalkiewicz@ipipan.waw.pl
Stan Szpakowicz, PhD, Foreign Associate	szpak@site.uottawa.ca
Aleksander Wawer, MSc	aleksander.wawer@ipipan.waw.pl
Marcin Woliński, PhD	marcin.wolinski@ipipan.waw.pl
Alina Wróblewska, MSc	alina.wroblewska@ipipan.waw.pl

Research

The main research areas of the Group

(Polish) corpus linguistics; cf. the IPI PAN Corpus of Polish and the National Corpus of Polish,
syntactic and semantic parsing of Polish; cf. Spejd and Świgra,
extraction of linguistic knowledge from corpora,
information extraction,
sentiment analysis,
morphosyntactic system of Polish,
generative linguistic formalisms, esp., HPSG and LFG.

The Group is a member of CLARIN, FLaReNet and META-NET.

Current externally funded projects

CESAR (CEntral and South-east europeAn Resources; part of META-NET) ‒ a European (CIP ICT-PSP) project (grant agreement 271022), 1 February 2011 ‒ 31 January 2013. Polish PI: Adam Przepiórkowski.
SYNAT (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society) – a National Centre for Research and Development grant, 16 August 2010 – 16 August 2013. Polish title: Utworzenie uniwersalnej, otwartej, repozytoryjnej platformy hostingowej i komunikacyjnej dla sieciowych zasobów wiedzy dla nauki, edukacji i otwartego społeczeństwa wiedzy. PI: Beata Konikowska.
Nekst(An adaptive system to support problem-solving on the basis of document collections in the Internet) ‒ a national Ministry of Science and Higher Education Innovative Economy Operational Programme (PO IG) grant, 1 January 2010 ‒ 31 December 2013. Polish title: Adaptacyjny system wspomagający rozwiązywanie problemów w oparciu o analizę treści dostępnych źródeł elektronicznych. PI: Jacek Koronacki.
ATLAS (Applied Technology for Language-Aided CMS) ‒ a European (CIP ICT-PSP) project (grant agreement 250467), 1 March 2010 ‒ 28 February 2013. Polish PI: Adam Przepiórkowski.
Construction of a treebank for Polish using automatic syntactic analysis ‒ a national Ministry of Science and Higher Education research grant (number N N104 224735), 14 October 2008 ‒ 13 April 2011. Polish title: Budowa banku drzew składniowych dla języka polskiego z wykorzystaniem automatycznej analizy składniowej. PI: Marcin Woliński.
CLARIN (Common Language Resources and Technology Infrastructure) ‒ a European (ESFRI) infrastructure project, FP7 (contract number 212230), 1 January 2008 ‒ 31 December 2010 (plus 6 months extension). PI at ICS PAS: Adam Przepiórkowski.
NKJP (National Corpus of Polish) ‒ a national Ministry of Science and Higher Education research/development grant (number R17 003 03), 13 December 2007 ‒ 12 December 2010 (plus 6 months extension). Polish title: Narodowy Korpus Języka Polskiego. PI: Adam Przepiórkowski.

Some of our past projects

Automatic detection of semantic dependencies within verb argument structures in large treebanks ‒ a national Ministry of Science and Higher Education habilitation grant (number N N516 0165 33), 2 November 2007 ‒ 1 November 2009. Polish title: Automatyczne wykrywanie zależności semantycznych w strukturze argumentowej czasowników w dużych korpusach tekstów anotowanych syntaktycznie. PI: Elżbieta Hajnicz.
LUNA (spoken Language UNderstanding in multilinguAl communication systems) ‒ a European ( IST) Specific Targeted Research Project (contract number 033549), 4 September 2006 ‒ 3 September 2009. Polish PI: Agnieszka Mykowiecka.
Spoken language understanding in multilingual communication systems ‒ a Ministry of Science and Higher Education support for the Polish participation in the LUNA project, 1 March 2008 ‒ 1 September 2009. Polish title: Rozumienie mowy w wielojęzycznych systemach komunikacji. PI: Małgorzata Marciniak.
LT4eL (Language Technology for eLearning) ‒ a European ( IST) Specific Targeted Research Project (contract number 027391), 1 December 2005 ‒ 31 May 2008. Polish PI: Adam Przepiórkowski.
Automatic extraction of linguistic knowledge from a large corpus of Polish ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00328), 9 March 2005 ‒ 8 March 2008. Polish title: Automatyczna ekstrakcja wiedzy lingwistycznej z dużego korpusu języka polskiego. PI: Adam Przepiórkowski. The first publicly available tagger of Polish, TaKIPI has originally been developed within this project.
Information Extraction from Polish free text ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00727), 20 October 2004 ‒ 19 October 2007. Polish title: Opracowanie narzędzi do ekstrakcji informacji z tekstów w języku polskim. PI: Agnieszka Mykowiecka.
The IPI PAN Corpus of Polish ‒ a national KBN grant (7T11C04320), 1 April 2001 ‒ 31 March 2004. Polish title: Anotowany korpus pisanego języka polskiego z dostępem przez internet (z uwzględnieniem zastosowań w inżynierii lingwistycznej). PI: Adam Przepiórkowski.
A Treebank / Test-Suite of Polish Utterances ‒ a EU CRIT-2 subproject (ICS-MM), 15 October 1997 ‒ 14 October 2000. Coordinator: Leonard Bolc.
An HPSG Grammar of Polish (theory and implementation) ‒ a national KBN grant (8T11C01110), 1 January 1996 ‒ 31 December 1998. Polish title: Zastosowanie metod inżynierii lingwistycznej do automatycznej analizy i syntezy tekstów języka polskiego. PI: Leonard Bolc.

Publicly available tools and resources

Here are some of the tools and resources created within our projects.

Tools (all open source, under GPL):

Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
Poliqarp – a corpus indexing and search engine,
Dendrarium – a treebank development system (under development),
Anotatornia – a system for multi-level manual annotation of corpora (forthcoming),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (forthcoming),
etc.

Resources:

IPI PAN Corpus of Polish,
National Corpus of Polish (under development).

Other activities

Links to some other activities of the Group:

NLP Seminar at IPI PAN;
Intelligent Information Systems series of conferences.

Linguistic Engineering Group

Menu

Wiki