Things to do
============

Corpus library (``sakura``)
---------------------------

* make the library i18n-aware

* add API to retrieve the tagset used by a corpus along with its description
  (language-aware description of grammatical categories and classes)

* design and implement a mechanism of speeding up execution of regular
  expressions on a fixed set of strings

* move the support of statistical queries and collocations from GUI

Server (``poliqarpd``)
----------------------

* extend the protocol to support the above-mentioned improvements,
  notably:  

  - i18n
  - comprehensive error messages
  - tagset description
  - statistics

* test it to death and beyond; thoroughly check for race conditions, 
  deadlocks and memory leaks; consider using Netcat and Expect tools
  for writing a testsuite

* try to port it to as many POSIX-compliant platforms as possible
  (this also pertains to sakura). I guess VMS would be the ultimate 
  test ;-)

Corpus builder (``bp``)
-----------------------

* re-think the design and implementation, notably the following issues:
   - do we really need ``foolog``? if we do, at least use it consistently
   - ditto for viewports (they do look nice but provide an unnecessary
     layer of obscurity)
   - the abstraction of ``bp_parser`` seems to be leaky; either make it solid
     or get rid of it altogether

* replace the Lisp VM-based header parser with something understandable by
  more than one person in the world

* perhaps integrate with indexer?

GUI
---

* make it more accessible and conformant to UI design guidelines

* support the new features of sakura when they're done

* implement a graphical query editor (something along the lines of
  KDE's ``kregexpeditor``)

* implement a tagset browser

* implement a decent help system

WWW UI
------

* improve accessibility and usability (as above); heed JSB's suggestions

* improve browser capability: fix rendering issues under M$IE, while
  retaining XHTML conformance

* add permalinks

The entire codebase
-------------------

* clean up the code (most notably ``sakura/query.c`` and
  ``gui/ipipan/poliqarp/gui/Application.java``)

* comment the code more thoroughly; ideally it should be possible to produce
  full documentation of all public APIs using doxygen (for C) and Javadoc
  (for Java)

Miscellaneous
-------------

* write more clients; it would be particularly nice to have a command-line 
  utility that would allow for executing queries in batch mode

* write proper user documentation (DocBook?)

* write another corpora processing library as a proof-of-concept (it could,
  for example, be a dummy library, or a simple library working on untagged
  texts, or a C interface to Lukasz Degorski's 'Pescador' concordancer)
  
If you want to attempt to do or help doing one of these things, you are
encouraged to do so! Current bugs and tasks are tracked at
<http://poliqarp.sf.net/bugs/>.
