Converters for NKJP formats

This page (under construction as of the end of September 2011) collects converters from and to the TEI4NKJP XML format, as used in the National Corpus of Polish.

Converters from the output of Anotatornia to TEI NKJP

The format evolved during the project and the final TEI4NKJP is a little bit different than the Anotatornia (see http://nlp.ipipan.waw.pl/Anotatornia/) output. To upgrade, use the following scripts:

The scripts were meant to be simple. Fatal error reporting in modify-tei-morphosyntax.pl is straightforward: a line is printed to the output file, rendering the XML not well-formed. In all cases, the resulting files should be validated.