morfologik.fsa.dictionary
Class DictionaryStemmer

java.lang.Object
  extended by morfologik.fsa.dictionary.DictionaryStemmer
All Implemented Interfaces:
IStemmer

public final class DictionaryStemmer
extends java.lang.Object
implements IStemmer

This class implements a dictionary lookup over an FSA dictionary.

Please note that FSA's in Jan Daciuk's implementation use bytes not unicode characters. Therefore objects of this class always have to be constructed with an encoding used to convert Java strings to byte arrays and vice versa. The dictionary for this class should be created using Jan Daciuk's FSA package.

See Also:
FSA package Web site

Constructor Summary
DictionaryStemmer(Dictionary dictionary)
          Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.
 
Method Summary
 java.lang.String[] stem(java.lang.String word)
          Returns an array of potential base forms (stems) of the word, or null if the word is not found in the dictionary.
 java.lang.String[] stemAndForm(java.lang.String word)
          Returns an array of pairs of the form: String stem1, String form1, String stem2, String stem2, ...
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DictionaryStemmer

public DictionaryStemmer(Dictionary dictionary)
                  throws java.io.UnsupportedEncodingException,
                         java.lang.IllegalArgumentException

Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.

Throws:
java.io.UnsupportedEncodingException - if the given encoding is not found in the system.
java.lang.IllegalArgumentException - if FSA's root node cannot be acquired (dictionary is empty).
Method Detail

stem

public java.lang.String[] stem(java.lang.String word)
Description copied from interface: IStemmer
Returns an array of potential base forms (stems) of the word, or null if the word is not found in the dictionary.

Specified by:
stem in interface IStemmer
See Also:
IStemmer.stem(String)

stemAndForm

public java.lang.String[] stemAndForm(java.lang.String word)
Description copied from interface: IStemmer

Returns an array of pairs of the form:

 String stem1, String form1, String stem2, String stem2, ...
 
or null if the word is not found in the dictionary.

The form tag is a simple string and depends on what was saved in the automaton (it may be nonsensical or even null).

Specified by:
stemAndForm in interface IStemmer
See Also:
IStemmer.stemAndForm(String)