Differences between revisions 1 and 7 (spanning 6 versions)

KORBA project

Project factsheet

English name:	Electronic corpus of 17th and 18th century Polish texts
Polish name:	Elektroniczny korpus tekstów polskich z XVII i XVIII w. (do roku 1772)
Project type:	A Ministry of Science and Higher Education National Programme for the Development of Humanities grant 0036/NPRH2/H11/81/2012
Duration:	1 maja 2013 ‒ 30 kwietnia 2018
Project Web page:	http://nlp.ipipan.waw.pl/wiki/korba (authorization required)
Principal investigator:	Włodzimierz Gruszczyński

Partners

The history of the Polish language of the 17th and-18th-century-laboratory, Institute of the Polish Language, Polish Academy of Sciences
Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences

Project description

The aim of the project is the creation of a corpus of 17th and 18th century Polish texts (up to 1772) and tools for its processing (searching, filtering, summarizing statistical data, etc.). The entire corpus will feature annotation for text structure and language (all foreign elements, e.g. Latin intrusions, will be distinguished), and a portion of it will also feature morphological annotation. Since the corpus will mark another stage of development of the Polish National Corpus (Narodowy Korpus Języka Polskiego, NKJP, see: http://nkjp.pl/), we intend to cooperate with the institution and the people behind NKJP, i.e. the Linguistic Engineering Group of the Institute of Computer Science, Polish Academy of Sciences.

The existing corpus of contemporary (20th century) Polish texts has been created by the consortium consisting of: the Institute of Computer Science, Polish Academy of Sciences (project coordinator); the Institute of Polish Studies, Polish Academy of Sciences; Polish Scientific Publishers PWN; the Department of Computational and Corpus Linguistics of the University of Łódź. Another corpus of old (pre-1500) Polish texts exists, but remains unannotated and is not equipped with a search interface. The creation of a 17th and 18th century Polish corpus constitutes an important step towards extending the scale of the National Corpus of Polish to written texts from all eras. The project is crucial for researching the history of the Polish language - in particular, it accelerates ongoing work on the dictionary of 17th and early 18th century Polish, but will no doubt come in useful for other forms of historical linguistic studies (e.g. evolution of Polish grammar, regional and social variety of historical Polish), as well as literary studies and editorial work. An additional goal is the to initiate a series of publications of a number of (non-literary) texts from a selected period.

Since existing tools designed for the corpus of contemporary Polish will have to be adapted to the historical corpus, the project also contributes to the field of linguistic engineering in Poland as a whole. The corpus will be:

open for use for a variety of purposes;
considerably large (aiming for around 12 million tokens);
structurally annotated, i.e. every concordance will be provided with source information, including page number and in-text status (marginalia, notes, errata, etc.);
lexically annotated in its entirety (the process is expected to be partially automated - individual tokens will be automatically associated with appropriate lexemes);
morphosyntactically annotated to a certain extent (initially aiming for 0.5 million text segments and expanding coverage over time);
freely available online (open access), as in the case of NKJP;
equipped with tools for finding certain linguistic elements or establishing their frequency in text, allowing for searches restricted to a given time period, author, publisher, geographic area, prominence of foreign quotations in text, etc.

-  ⇤ ← Revision 1 as of 2013-04-10 14:18:38 → 
  Size: 1800
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 7 as of 2014-11-05 13:19:56 → ⇥
  Size: 4231
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 6:
-|| English name:         || An adaptive system to support problem-solving on the basis of document collections in the Internet ||
|| Polish name:          || Adaptacyjny system wspomagający rozwiązywanie problemów w&nbsp;oparciu o analizę treści dostępnych źródeł elektronicznych ||
|| Project type:         || A national [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] Innovative Economy Operational Programme (PO IG) grant ||
|| Duration:             || 1 April 2009 ‒ 10 February 2014 ||
|| Project Web page:     || [[http://www.ipipan.waw.pl/nekst/]] ||
|| Principal investigator: || Jacek Koronacki ||
+|| English name:         || Electronic corpus of 17th and 18th century Polish texts ||
|| Polish name:          || Elektroniczny korpus tekstów polskich z XVII i XVIII w. (do roku 1772) ||
|| Project type:         || A [[http://www.eng.nauka.gov.pl/meinen/|Ministry of Science and Higher Education]] National Programme for the Development of Humanities grant 0036/NPRH2/H11/81/2012 ||
|| Duration:             || 1 maja 2013 ‒ 30 kwietnia 2018 ||
|| Project Web page:     || [[http://nlp.ipipan.waw.pl/wiki/korba]] (authorization required) ||
|| Principal investigator: || Włodzimierz Gruszczyński ||
 Line 15:
- * [[ZILStart|Institute of Computer Science, Polish Academy of Sciences]]
 * [[http://www.pwr.wroc.pl/|Wrocław University of Technology]]
+ * [[https://www.ijp-pan.krakow.pl/en/organisational-structure/zaklad-historii-jezyka-polskiego/the-history-of-the-polish-language-of-the-17th-and-18th-century-laboratory|The history of the Polish language of the 17th and-18th-century-laboratory, Institute of the Polish Language, Polish Academy of Sciences]]
 * [[http://nlp.ipipan.waw.pl|Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences]]
 Line 20:
-The aim of the project is to design a system supporting a wide class of problem-solving tasks basing on an analysis of the structure and content of available electronic documents. The analysis concerns knowledge and information represented in text form and selected multimedia content. The system will combine automatic answering to the questions in Polish and automated analysis of opinions together with large-scale, cross-sectional analysis of semantic e-resources, search and visualization of results. The main object of analysis will be text documents. The system will base on new paradigms of analysis of content and content management, linking it with mechanisms of user interaction. As a target, it will be able to handle the collection of all the Polish-language documents on the Internet and will be equipped with mechanisms for bilingual (Polish-English) processing.
+The aim of the project is the creation of a corpus of 17th and 18th century Polish texts (up to 1772) and tools for its processing (searching, filtering, summarizing statistical data, etc.). The entire corpus will feature annotation for text structure and language (all foreign elements, e.g. Latin intrusions, will be distinguished), and a portion of it will also feature morphological annotation. Since the corpus will mark another stage of development of the Polish National Corpus (Narodowy Korpus Języka Polskiego, NKJP, see: http://nkjp.pl/), we intend to cooperate with the institution and the people behind NKJP, i.e. the Linguistic Engineering Group of the Institute of Computer Science, Polish Academy of Sciences.

The existing corpus of contemporary (20th century) Polish texts has been created by the consortium consisting of: the Institute of Computer Science, Polish Academy of Sciences (project coordinator); the Institute of Polish Studies, Polish Academy of Sciences; Polish Scientific Publishers PWN; the Department of Computational and Corpus Linguistics of the University of Łódź. Another corpus of old (pre-1500) Polish texts exists, but remains unannotated and is not equipped with a search interface. The creation of a 17th and 18th century Polish corpus constitutes an important step towards extending the scale of the National Corpus of Polish to written texts from all eras. The project is crucial for researching the history of the Polish language - in particular, it accelerates ongoing work on the dictionary of 17th and early 18th century Polish, but will no doubt come in useful for other forms of historical linguistic studies (e.g. evolution of Polish grammar, regional and social variety of historical Polish), as well as literary studies and editorial work. An additional goal is the to initiate a series of publications of a number of (non-literary) texts from a selected period.

Since existing tools designed for the corpus of contemporary Polish will have to be adapted to the historical corpus, the project also contributes to the field of linguistic engineering in Poland as a whole. The corpus will be: 
 1. open for use for a variety of purposes; 
 2. considerably large (aiming for around 12 million tokens); 
 3. structurally annotated, i.e. every concordance will be provided with source information, including page number and in-text status (marginalia, notes, errata, etc.); 
 4. lexically annotated in its entirety (the process is expected to be partially automated - individual tokens will be automatically associated with appropriate lexemes); 
 5. morphosyntactically annotated to a certain extent (initially aiming for 0.5 million text segments and expanding coverage over time); 
 6. freely available online (open access), as in the case of NKJP; 
 7. equipped with tools for finding certain linguistic elements or establishing their frequency in text, allowing for searches restricted to a given time period, author, publisher, geographic area, prominence of foreign quotations in text, etc.

Diff for "KORBA"

Menu

Wiki

KORBA project

Project factsheet

Partners

Project description