Locked History Actions


KORBA 2 project

Project factsheet

English name:

Extension of the "Electronic corpus of 17th and 18th century Polish texts" and its integration with the "Electronic Dictionary of the 17th–18th Century Polish"

Polish name:

Rozbudowa "Elektronicznego Korpusu Tekstów Polskich XVII i XVIII w." i jego integracja z "Elektronicznym słownikiem języka polskiego XVII i XVIII w."

Project type:

A Ministry of Science and Higher Education National Programme for the Development of Humanities grant 11H 18 0413 86


2019 – 2023

Project Web page:

http://wiki.nlp.ipipan.waw.pl/korba (authorization required)

Principal investigator:

Włodzimierz Gruszczyński


Project goals

1. Extension of the "Electronic Corps of Polish Texts of the 17th and 18th centuries (until 1772)" created in the first stage of the project and its transformation into the "Electronic Corps of Polish Texts of the 17th and 18th centuries":

1.1. supplementing the corpus with texts from 1772 to the end of the 18th century,

1.2. adding new texts to the existing corpus:

1.2.1. important texts that have been previously included only as fragments,

1.2.2. increasing the number of texts from the early 18th century,

1.2.3. greater stylistic balance.

2. Improvement of IT tools supporting the corpus:

2.1. the transcriber,

2.2. the morphological analyzer (adapting the tagset to texts from the end of the 18th century),

2.3. the tagger,

2.4. the corpus search engine - adaptation to support an extended corpus.

3. Integration of the corpus with the Electronic Dictionary of the Polish Language of the 17th and 18th centuries - creation of tools for automatic data extraction and supporting the selection of the best examples.

4. Partial integration with The National Corpus of Polish (NKJP) - integrated search.

5. Partial integration with the Digital Library of Polish and Polish Ephemeral Prints from the 16th, 17th and 18th centuries.

6. Creating a web portal integrating resources for the needs of an external user (outside the editorial team).

7. Performing syntactic, inflectional and semantic-lexical analyzes of the corpus material.