Warning: this release should be considered unstable, as the major changes it introduces have not been tested thoroughly.
Poliqarp 1.3 lifts limitations on corpora sizes: it should be possible to build and process any reasonable corpus up to 2G segments. Unfortunately, the binary corpus format needed to be changed.
You can check version of your corpus by inspecting the *.cdf file:
Sakura, the underlying library, does no longer support the old format. However a conversion utility, bpupgrade is provided. Note that it modifies corpora in place, so please backup your data!
indexer name was found to be too generic. It has been renamed to bpindexer.
The build system has been completely rewritten. As a result, it is now possible to do a parallel build (see the -j option of GNU make).
Poliqarp 1.3.2 introduces an experimental support for variables in the query language. I.e., queries like [pos=adj & case=$1] [pos=subst & case=$1] should work as expected.
Moreover, as of Poliqarp 1.3.3, you can query for a space before segment. For example, [] [base=śmy & space=0] will find a segment followed by śmy without an inner space.
Up to 1.3.2, Poliqarp used a hard-coded heuristics to intiutively handle queries like przyszedłem in Polish language. The mechanism was inflexible, incorrect for any but Polish corpora and buggy. In Poliqarp 1.3.3 it was replaced with a flexible, configurable one.
To (approximately) restore the old query semantics, you will need to add the following section to your corpus configuration file:
[query-rewrite-rules] default = "^(by)(śmy|ście|ś|m)$" "[orth='$<$1$2$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0]" default = "^(.+)(by)(śmy|ście|ś|m)$" "[orth='$<$1$2$3$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0][orth='$3'$i&space=0]" default = "^(.+)(by|ście|śmy|eś|em|ń)$" "[orth='$<$1$2$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0]" default = "^(.+)(m)$" "[orth='$<$1$2$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0]"