The Polish Automatic Collocations Dictionary has been created by Lexical Computing Ltd. and have been made available to the research community as part of the CESAR initiative.
The ACD contains, for 30,000 dictionary headwords of the language, the major grammatical relations that the headword occurs in, and the collocates it occurs within that grammatical relation. The 30,000 headwords have been selected by Lexical Computing on the basis of corpus frequency. They are largely nouns, verbs and adjectives, with grammatical words usually excluded.
There are between three and fifty collocates per headword, depending on the frequency and behavior of the particular headword.
Each collocate is also a hyperlink to corpus instances for the headword and collocation, in the Sketch Engine, at http://www.sketchengine.co.uk. Users are able to follow the links and see the corpus instances (and make further explorations of the source corpus) provided they have a Sketch Engine account.
The ACD and Sketch Grammar are available to download and use under the Creative Commons CC-BY-SA licence as specified at http://creativecommons.org/licenses/by-sa/3.0/.
The ACD is available in UTF-8-encoded XML, with a simple DTD.