Locked History Actions

Diff for "Linguistic Engineering Group"

Differences between revisions 2 and 3
Revision 2 as of 2011-03-07 11:49:54
Size: 11968
Comment:
Revision 3 as of 2011-03-07 13:41:55
Size: 9597
Comment:
Deletions are marked like this. Additions are marked like this.
Line 43: Line 43:
 * ''[[http://ec.europa.eu/information_society/apps/projects/factsheet/index.cfm?project_ref=271022|CESAR]]'' (''CEntral and South-east europeAn Resources''; part of [[http://www.meta-net.eu/|META-NET]]) ‒ a European ([[http://ec.europa.eu/information_society/activities/ict_psp/index_en.htm|CIP ICT-PSP]]) project (grant agreement 271022), 1 February 2011 ‒ 31 January 2013. Polish PI: Adam Przepiórkowski.
 * ''[[http://www.elka.pw.edu.pl/pol/Media2/Serwis-Informacyjny/Aktualnosci/Projekt-SYNAT-System-Nauki-i-Techniki-realizowany-w-ramach-INFINITI-PASSIM|SYNAT]] (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society)'' – a [[http://en.ncbir.pl/|National Centre for Research and Development]] grant, 16 August 2010 – 16 August 2013. Polish title: ''Utworzenie uniwersalnej, otwartej, repozytoryjnej platformy hostingowej i komunikacyjnej dla sieciowych zasobów wiedzy dla nauki, edukacji i otwartego społeczeństwa wiedzy''. PI: Beata Konikowska.
 * ''[[http://www.ipipan.waw.pl/nekst/|Nekst]](An adaptive system to support problem-solving on the basis of document collections in the Internet)'' ‒ a national [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] Innovative Economy Operational Programme (PO IG) grant, 1 January 2010 ‒ 31 December 2013. Polish title: ''Adaptacyjny system wspomagający rozwiązywanie problemów w oparciu o analizę treści dostępnych źródeł elektronicznych''. PI: Jacek Koronacki.
 * ''[[http://www.atlasproject.eu/|ATLAS]]'' (''Applied Technology for Language-Aided CMS'') ‒ a European ([[http://ec.europa.eu/information_society/activities/ict_psp/index_en.htm|CIP ICT-PSP]]) project (grant agreement 250467), 1 March 2010 ‒ 28 February 2013. Polish PI: Adam Przepiórkowski.
 * ''Construction of a treebank for Polish using automatic syntactic analysis'' ‒ a national [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] research grant (number N N104 224735), 14 October 2008 ‒ 13 April 2011. Polish title: ''Budowa banku drzew składniowych dla języka polskiego z wykorzystaniem automatycznej analizy składniowej''. PI: Marcin Woliński.
 * ''[[http://www.clarin.eu/|CLARIN]] (Common Language Resources and Technology Infrastructure)'' ‒ a European ([[http://cordis.europa.eu/esfri/|ESFRI]]) infrastructure project, FP7 (contract number 212230), 1 January 2008 ‒ 31 December 2010 (plus 6 months extension). PI at ICS PAS: Adam Przepiórkowski.
 * ''[[http://nkjp.pl/|NKJP]] (National Corpus of Polish)'' ‒ a national [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] research/development grant (number R17 003 03), 13 December 2007 ‒ 12 December 2010 (plus 6 months extension). Polish title: ''Narodowy Korpus Języka Polskiego''. PI: Adam Przepiórkowski.
 * [[CESAR]] (CEntral and South-east europeAn Resources),
 * [[SYNAT]] (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society),
 * [[NEKST]] (An adaptive system to support problem-solving on the basis of document collections in the Internet),
 * [[ATLAS]] (Applied Technology for Language-Aided CMS),
 * [[Construction of a treebank for Polish using automatic syntactic analysis]],
 * [[CLARIN]] (Common Language Resources and Technology Infrastructure),
 * [[NKJP]] (National Corpus of Polish).

The Linguistic Engineering Group

The Linguistic Engineering (LE) Group is part of the Department of Artificial Intelligence at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS).

People

Anna Andrzejczuk, MSc

anna.andrzejczuk@ipipan.waw.pl

Leonard Bolc, PhD (Professor Emeritus)

leonard.bolc@ipipan.waw.pl

Łukasz Degórski, MSc

ldegorski@bach.ipipan.waw.pl

Elżbieta Hajnicz, PhD

elzbieta.hajnicz@ipipan.waw.pl

Łukasz Kobyliński, MSc

lkobylinski@ipipan.waw.pl

Anna Kupść, PhD (on leave)

anna.kupsc@ipipan.waw.pl

Małgorzata Marciniak, PhD

malgorzata.marciniak@ipipan.waw.pl

Marcin Miłkowski, PhD (part time)

marcin.milkowski@ifispan.waw.pl

Agnieszka Mykowiecka, PhD

agnieszka.mykowiecka@ipipan.waw.pl

Maciej Ogrodniczuk, PhD

maciej.ogrodniczuk@ipipan.waw.pl

Jakub Piskorski, PhD (Associate)

jakub.piskorski@ipipan.waw.pl

Adam Przepiórkowski, PhD, Head of the Group

adam.przepiorkowski@ipipan.waw.pl

Piotr Rychlik, PhD

rychlik@ipipan.waw.pl

Tomek Strzałkowski, PhD, Foreign Associate

tomek@cs.albany.edu

Łukasz Szałkiewicz, MSc

lukasz.szalkiewicz@ipipan.waw.pl

Stan Szpakowicz, PhD, Foreign Associate

szpak@site.uottawa.ca

Aleksander Wawer, MSc

aleksander.wawer@ipipan.waw.pl

Marcin Woliński, PhD

marcin.wolinski@ipipan.waw.pl

Alina Wróblewska, MSc

alina.wroblewska@ipipan.waw.pl

Research

The main research areas of the Group

  • (Polish) corpus linguistics; cf. the IPI PAN Corpus of Polish and the National Corpus of Polish,

  • syntactic and semantic parsing of Polish; cf. Spejd and Świgra,

  • extraction of linguistic knowledge from corpora,
  • information extraction,
  • sentiment analysis,
  • morphosyntactic system of Polish,
  • generative linguistic formalisms, esp., HPSG and LFG.

The Group is a member of CLARIN, FLaReNet and META-NET.

Current externally funded projects

  • CESAR (CEntral and South-east europeAn Resources),

  • SYNAT (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society),

  • NEKST (An adaptive system to support problem-solving on the basis of document collections in the Internet),

  • ATLAS (Applied Technology for Language-Aided CMS),

  • Construction of a treebank for Polish using automatic syntactic analysis,

  • CLARIN (Common Language Resources and Technology Infrastructure),

  • NKJP (National Corpus of Polish).

Some of our past projects

  • Automatic detection of semantic dependencies within verb argument structures in large treebanks ‒ a national Ministry of Science and Higher Education habilitation grant (number N N516 0165 33), 2 November 2007 ‒ 1 November 2009. Polish title: Automatyczne wykrywanie zależności semantycznych w strukturze argumentowej czasowników w dużych korpusach tekstów anotowanych syntaktycznie. PI: Elżbieta Hajnicz.

  • LUNA (spoken Language UNderstanding in multilinguAl communication systems) ‒ a European ( IST) Specific Targeted Research Project (contract number 033549), 4 September 2006 ‒ 3 September 2009. Polish PI: Agnieszka Mykowiecka.

  • Spoken language understanding in multilingual communication systems ‒ a Ministry of Science and Higher Education support for the Polish participation in the LUNA project, 1 March 2008 ‒ 1 September 2009. Polish title: Rozumienie mowy w wielojęzycznych systemach komunikacji. PI: Małgorzata Marciniak.

  • LT4eL (Language Technology for eLearning) ‒ a European ( IST) Specific Targeted Research Project (contract number 027391), 1 December 2005 ‒ 31 May 2008. Polish PI: Adam Przepiórkowski.

  • Automatic extraction of linguistic knowledge from a large corpus of Polish ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00328), 9 March 2005 ‒ 8 March 2008. Polish title: Automatyczna ekstrakcja wiedzy lingwistycznej z dużego korpusu języka polskiego. PI: Adam Przepiórkowski. The first publicly available tagger of Polish, TaKIPI has originally been developed within this project.

  • Information Extraction from Polish free text ‒ a national Ministry of Science and Higher Education research grant (number 3T11C00727), 20 October 2004 ‒ 19 October 2007. Polish title: Opracowanie narzędzi do ekstrakcji informacji z tekstów w języku polskim. PI: Agnieszka Mykowiecka.

  • The IPI PAN Corpus of Polish ‒ a national KBN grant (7T11C04320), 1 April 2001 ‒ 31 March 2004. Polish title: Anotowany korpus pisanego języka polskiego z dostępem przez internet (z uwzględnieniem zastosowań w inżynierii lingwistycznej). PI: Adam Przepiórkowski.

  • A Treebank / Test-Suite of Polish Utterances ‒ a EU CRIT-2 subproject (ICS-MM), 15 October 1997 ‒ 14 October 2000. Coordinator: Leonard Bolc.

  • An HPSG Grammar of Polish (theory and implementation) ‒ a national KBN grant (8T11C01110), 1 January 1996 ‒ 31 December 1998. Polish title: Zastosowanie metod inżynierii lingwistycznej do automatycznej analizy i syntezy tekstów języka polskiego. PI: Leonard Bolc.

Publicly available tools and resources

Here are some of the tools and resources created within our projects.

Tools (all open source, under GPL):

  • Świgra – a DCG parser,

  • Spejd – a shallow parsing and disambiguation system,

  • TaKIPI – a morphosyntactic tagger for Polish,

  • PANTERA – a morphosyntactic tagger for Polish,

  • Poliqarp – a corpus indexing and search engine,

  • Dendrarium – a treebank development system (under development),

  • Anotatornia – a system for multi-level manual annotation of corpora (forthcoming),

  • WSDDE – a system for designing and performing Word Sense Disambiguation experiments (forthcoming),

  • etc.

Resources:

Other activities

Links to some other activities of the Group: