Locked History Actions

Diff for "Converters4NKJP"

Differences between revisions 6 and 7
Revision 6 as of 2011-09-28 11:36:19
Size: 989
Comment:
Revision 7 as of 2011-09-28 11:43:40
Size: 989
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
The format evolved during the project and the final NKJP TEI is a little bit different than the Anotatornia (see [[Anotatornia|http://nlp.ipipan.waw.pl/Anotatornia/]]) output. To 'upgrade', use the following scripts: The format evolved during the project and the final TEI4NKJP is a little bit different than the Anotatornia (see [[Anotatornia|http://nlp.ipipan.waw.pl/Anotatornia/]]) output. To 'upgrade', use the following scripts:

Converters for NKJP formats

This page (under construction as of the end of September 2011) collects converters from and to the TEI4NKJP XML format, as used in the National Corpus of Polish.

Converters from the output of Anotatornia to TEI NKJP

The format evolved during the project and the final TEI4NKJP is a little bit different than the Anotatornia (see http://nlp.ipipan.waw.pl/Anotatornia/) output. To 'upgrade', use the following scripts:

The scripts were meant to be simple. Fatal error reporting in modify-tei-morphosyntax.pl is straightforward: a line is printed to the output file, rendering the XML not well-formed. In all cases, the resulting files should be validated.