plWordNet 2.0

Project factsheet

English title:
Polish title:	Półautomatyczna konstrukcja zasobów leksykalnych przez rozpoznawanie relacji semantycznych na podstawie danych morfo-syntaktycznych i semantycznych w korpusach tekstu
Project type:	A national Ministry of Science and Higher Education research grant (number N N516 068637)
Duration:	1 October 2009 ‒ 30 September 2012
Project Web page:	http://nlp.pwr.wroc.pl/projekty/slowosiec2
Principal investigator:	Maciej Piasecki
Institution:	Institute of Applied Informatics, Wrocław University of Technology

Project description

This project is continuation of the former project (3 T11C 018 29) on the construction of plWordNet 1.0 -- the first publicly available wordnet for Polish. The main goal of this project is to extend and improve Distributional Semantics methods and pattern-based methods developed in the former project and to build a complex, semi-automatic system supporting linguists working on plWordNet construction.

The second goal is to extend plWordNet 1.0 to the size of 70000-80000 lexical units (pairs: lemma, sense number) and 45000-55000 synsets.

The main objective of both projects was to construct a Polish WordNet as economically as possible.

Polish WordNet is a network of lexical-semantic relations, an electronic thesaurus with a structure modelled on that of the Princeton WordNet and those constructed in the EuroWordNet project. Polish WordNet describes the meaning of a lexical unit of one or more words by placing this unit in a network of links which represent such relations as synonymy, hypernyny, meronymy etc.

To reduce the cost of the project, Polish WordNet was built semi-automatically. Lexical relations were automatically recognized in large corpora of Polish, e.g., IPI PAN Corpus) and suggested to linguists/lexicographers via a graphical interface.