Poliqarp 1.3 changes
====================

**Warning**: this release should be considered unstable, as the major
changes it introduces have not been tested thoroughly.

New binary format
-----------------
Poliqarp 1.3 lifts limitations on corpora sizes: it should be possible to
build and process any reasonable corpus up to 2G segments. Unfortunately, the
binary corpus format needed to be changed.

You can check version of your corpus by inspecting the ``*.cdf`` file:

* lack of the ``*.cdf`` file indicates the old format; 
* ``version = 1`` string indicates the old format;
* ``version = 2`` string indicates the new one.

Sakura, the underlying library, does no longer support the old format.
However a conversion utility, ``bpupgrade`` is provided. Note that it modifies
corpora in place, so please backup your data!

Renames
-------
``indexer`` name was found to be too generic. It has been renamed to
``bpindexer``.

Build system
------------
The build system has been completely rewritten. As a result, it is now possible
to do a parallel build (see the ``-j`` option of GNU make).

Query language
--------------
Poliqarp 1.3.2 introduces an experimental support for variables in the query
language. I.e., queries like ``[pos=adj & case=$1] [pos=subst & case=$1]``
should work as expected.

Moreover, as of Poliqarp 1.3.3, you can query for a space before segment.
For example, ``[] [base=śmy & space=0]`` will find a segment followed by
``śmy`` without an inner space.

Parser cleanup
--------------
Up to 1.3.2, Poliqarp used a hard-coded heuristics to intiutively handle
queries like ``przyszedłem`` in Polish language. The mechanism was
inflexible, incorrect for any but Polish corpora and buggy. In Poliqarp
1.3.3 it was replaced with a flexible, configurable one.

To (approximately) restore the old query semantics, you will need to add the
following section to your corpus configuration file::

  [query-rewrite-rules]
  default = "^(by)(śmy|ście|ś|m)$"     "[orth='$<$1$2$>'$i]   | [orth='$<$1'$i][orth='$2'$i&space=0]"
  default = "^(.+)(by)(śmy|ście|ś|m)$" "[orth='$<$1$2$3$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0][orth='$3'$i&space=0]"
  default = "^(.+)(by|ście|śmy|eś|em|ń)$" "[orth='$<$1$2$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0]"
  default = "^(.+)(m)$"                   "[orth='$<$1$2$>'$i] | [orth='$<$1'$i][orth='$2'$i&space=0]"

.. vim:tw=76 ft=rst
