Attachment 'README_en.txt'
Download 1 Spejd 0.8.4
2
3 Copyright (C) IPI PAN, 2007-2010. All rights reserved.
4 Available under the terms of the GNU General Public License;
5 see the file doc/gpl.txt for details.
6
7 ABOUT
8
9 Spejd is a shallow parser, which allows for simultaneous syntactic
10 parsing and morphological disambiguation, developed at the
11 Institute of Computer Science, Polish Academy od Sciences, Warsaw.
12
13 Spejd homepage:
14 http://nlp.ipipan.waw.pl/Spejd/
15
16 Last releases:
17 0.8.4: bugfix release
18 0.8.3: bugfix release
19 0.8.2: bugfix release
20 0.8.1:
21
22 Compared to the previous release, major changes in this version include:
23 - Integrated plain text mode processing module based on morphological
24 analyzer Morfologik (http://morfologik.blogspot.com/). This module requires
25 appropriately encoded input, as defined by inputEncoding config parameter.
26 Plain text module is enabled by inputType parameter (auto or txt).
27 - Parallel processing (benefits are immediate on multicore CPUs).
28 The number of processing threads are defined by maxThreads parameter.
29 - A simple spelling correction module, addressing lacks of Polish
30 diactrics. Possible transformations are listed in ogonkifier.ini.
31 - Changes listed in doc/changes0_5.txt.
32
33 REQUIREMENTS
34
35 Sun Java Runtime Environment version 1.5 or higher.
36
37 Notice: it may be possible to run the program on alternative Java
38 implementation, but because of differences in regular expression
39 implementations, we can not guarantee its behaviour.
40
41 INSTALLATION
42
43 Unzip the file spade.zip. Installation finished!
44
45 SYNOPSIS
46
47 java -jar spejd.jar path [options]
48
49 where:
50
51 - path - a single file or a folder with XML CES (see doc/xcesIPIAna.dtd)
52 or plain text files (.txt, encoding defined by inputEncoding parameter)
53 to parse; the parser looks for files matching a pattern defined in
54 config.ini (inputFiles parameter) and recursively checks subdirectories.
55
56 - options - optional list of assignments var=value; var has to be one
57 of variables from config.ini; values passed as an invocations
58 argument override the default values from the file.
59
60 Examples:
61
62 java -jar spejd.jar corpus nullAgreement=1
63 java -jar spejd.jar corpus rules=rules2.sr logDir=log2
64 java -jar spejd.jar corpus discardDeleted=true outputSuffix=.sh2.xml
65
66 RESULTS
67
68 In the case of xml input, for each directory, in which filename.xml(.gz)
69 has been found, a new filenameSh.xml is created. It is a copy of a
70 corresponding .xml, but with additional annotation: token
71 identifiers, disambiguation attributes, syntactic word and groups.
72 In the case of plain text input filename.txt, a new xml file
73 (file name ends with Sh.xml) is created for each corresponding .txt file.
74
75 A few additional files are generated in logs subdirectory of the spade
76 directory:
77
78 rules.compiled - a compiled set of rules
79
80 rules.matched.csv - rules statistics: for each rule gives the number
81 of completed (evaluated to true) matches, the number of matches,
82 matching time, evaluation time, total time
83
84 tagdict.ini - tags dictionary, translating the tagset defined in
85 configuration file to inner positional tagset
86
87 DOCUMENTATION
88
89 doc/spade.pdf - a paper about Spejd
90 doc/xcesAnaIPI.dtd - DTD of the input format
91 api/ - technical documentation
92
93 EXAMPLE
94
95 ./sample-morfeusz.cfg - example Morfeusz tagset file
96 ./sample-morfologik.cfg - example Morfologik tagset file (for plain text input)
97 ./rules.sr - example set of rules
98 doc/morph.xml - example XML input to the parser
99 doc/morphSh.xml - example output
100 doc/display.* - stylesheets and example output
101
102 WHAT'S NEW IN THIS VERSION
103
104
105
106 FOR DEVELOPERS
107
108 Please feel free to play around with the sources, modify them and post
109 patches on Spejd's bugtracker at sourceforge (linked from the homepage)!
110 See api/ - for a brief introduction to the code structure.
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.