|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectmorfologik.fsa.dictionary.DictionaryStemmer
public final class DictionaryStemmer
This class implements a dictionary lookup over an FSA dictionary.
Please note that FSA's in Jan Daciuk's implementation use bytes not unicode characters. Therefore objects of this class always have to be constructed with an encoding used to convert Java strings to byte arrays and vice versa. The dictionary for this class should be created using Jan Daciuk's FSA package.
Constructor Summary | |
---|---|
DictionaryStemmer(Dictionary dictionary)
Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes. |
Method Summary | |
---|---|
java.lang.String[] |
stem(java.lang.String word)
Returns an array of potential base forms (stems) of the word, or null
if the word is not found in the dictionary. |
java.lang.String[] |
stemAndForm(java.lang.String word)
Returns an array of pairs of the form: String stem1, String form1, String stem2, String stem2, ... |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DictionaryStemmer(Dictionary dictionary) throws java.io.UnsupportedEncodingException, java.lang.IllegalArgumentException
Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.
java.io.UnsupportedEncodingException
- if the given encoding is not
found in the system.
java.lang.IllegalArgumentException
- if FSA's root node cannot be acquired
(dictionary is empty).Method Detail |
---|
public java.lang.String[] stem(java.lang.String word)
IStemmer
null
if the word is not found in the dictionary.
stem
in interface IStemmer
IStemmer.stem(String)
public java.lang.String[] stemAndForm(java.lang.String word)
IStemmer
Returns an array of pairs of the form:
String stem1, String form1, String stem2, String stem2, ...or
null
if the word is not found in the dictionary.
The form tag is a simple string and depends on what was saved in the automaton
(it may be nonsensical or even null
).
stemAndForm
in interface IStemmer
IStemmer.stemAndForm(String)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |