morfologik.fsa.core
Class FSA

java.lang.Object
  extended by morfologik.fsa.core.FSA
Direct Known Subclasses:
FSAVer5Impl

public abstract class FSA
extends java.lang.Object

FSA (Finite State Automaton) traversal implementation, abstract base class for all versions of FSA.

This class implements Finite State Automaton traversal as described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).

This is a Java port of the original fsa class, implemented by Jan Daciuk in the FSA package. Major redesign has been done, however, to fit this implementation to the specifics of Java language and its coding style.


Nested Class Summary
static interface FSA.Arc
          An arc (a labelled transition between two nodes) of the FSA.
static interface FSA.Node
          A node of the FSA.
 
Field Summary
protected  byte filler
          The meaning of this field is not clear (check the FSA docs).
static int FSA_FLEXIBLE
          These flags control the internal representation of a FSA.
static int FSA_LARGE_DICTIONARIES
           
static int FSA_NEXTBIT
           
static int FSA_STOPBIT
           
static int FSA_TAILS
           
static int FSA_WEIGHTED
           
protected  byte gotoLength
          Size of transition's destination node "address".
protected  byte version
          Dictionary version (derived from the combination of flags).
static byte VERSION_5
          Version number for version 5 of the automaton.
 
Constructor Summary
protected FSA(java.io.InputStream fsaStream, java.lang.String dictionaryEncoding)
          Creates a new automaton reading the FSA automaton from an input stream.
 
Method Summary
 char getAnnotationSeparator()
           
 char getFillerCharacter()
           
 int getFlags()
          Returns a set of flags for this FSA instance.
static FSA getInstance(java.io.File fsaFile, java.lang.String dictionaryEncoding)
          This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.
static FSA getInstance(java.io.InputStream fsaStream, java.lang.String dictionaryEncoding)
          This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.
abstract  int getNumberOfArcs()
          Returns the number of arcs in this automaton.
abstract  int getNumberOfNodes()
          Returns the number of nodes in this automaton.
abstract  FSA.Node getStartNode()
          Returns the start node of this automaton.
 FSATraversalHelper getTraversalHelper()
           
 int getVersion()
          Returns a version number of this FSA.
protected  byte[] readFully(java.io.InputStream stream)
          Reads all bytes from an input stream.
protected  void readHeader(java.io.DataInput in, long fileSize)
          Reads a FSA header from a stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FSA_FLEXIBLE

public static final int FSA_FLEXIBLE
These flags control the internal representation of a FSA. They indicate how transitions (arcs) and nodes are stored. More info in the original FSA package.

See Also:
Constant Field Values

FSA_STOPBIT

public static final int FSA_STOPBIT
See Also:
Constant Field Values

FSA_NEXTBIT

public static final int FSA_NEXTBIT
See Also:
Constant Field Values

FSA_TAILS

public static final int FSA_TAILS
See Also:
Constant Field Values

FSA_WEIGHTED

public static final int FSA_WEIGHTED
See Also:
Constant Field Values

FSA_LARGE_DICTIONARIES

public static final int FSA_LARGE_DICTIONARIES
See Also:
Constant Field Values

VERSION_5

public static final byte VERSION_5
Version number for version 5 of the automaton.

See Also:
Constant Field Values

version

protected byte version
Dictionary version (derived from the combination of flags).


filler

protected byte filler
The meaning of this field is not clear (check the FSA docs).


gotoLength

protected byte gotoLength
Size of transition's destination node "address". This field may also have different interpretation, or may not be used at all. It depends on the combination of flags used for building FSA.

Constructor Detail

FSA

protected FSA(java.io.InputStream fsaStream,
              java.lang.String dictionaryEncoding)
       throws java.io.IOException
Creates a new automaton reading the FSA automaton from an input stream.

Parameters:
fsaStream - An input stream with FSA automaton.
Throws:
java.io.IOException - if the dictionary file cannot be read, or version of the file is not supported.
Method Detail

getVersion

public final int getVersion()
Returns a version number of this FSA.

The version number is a derivation of combination of flags and is exactly the same as in Jan Daciuk's FSA package.


getFlags

public final int getFlags()
Returns a set of flags for this FSA instance. Each flag is represented by a unique bit in the integer returned. Therefore to check whether the dictionary has been built using FSA_FLEXIBLE flag, one must perform a bitwise AND: boolean isFlexible = ((dict.getFlags() & FSA.FSA_FLEXIBLE ) != 0).


getAnnotationSeparator

public final char getAnnotationSeparator()
Returns:
Return the annotation separator character, converted to a character according to the encoding scheme passed in in the constructor of this class.
Since:
1.0.5

getFillerCharacter

public final char getFillerCharacter()
Returns:
Return the filler character, converted to a character according to the encoding scheme passed in in the constructor of this class.
Since:
1.0.5

getNumberOfArcs

public abstract int getNumberOfArcs()
Returns the number of arcs in this automaton. Depending on the representation of the automaton, this method may take a long time to finish.


getNumberOfNodes

public abstract int getNumberOfNodes()
Returns the number of nodes in this automaton. Depending on the representation of the automaton, this method may take a long time to finish.


getStartNode

public abstract FSA.Node getStartNode()
Returns the start node of this automaton. May return null if the start node is also an end node.


getTraversalHelper

public FSATraversalHelper getTraversalHelper()
Returns:
Returns an object which can be used to traverse a finite state automaton.
Since:
1.0.5

getInstance

public static FSA getInstance(java.io.File fsaFile,
                              java.lang.String dictionaryEncoding)
                       throws java.io.IOException
This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.

Throws:
java.io.IOException - An exception is thrown if no corresponding FSA parser is found or if the input file cannot be opened.

getInstance

public static FSA getInstance(java.io.InputStream fsaStream,
                              java.lang.String dictionaryEncoding)
                       throws java.io.IOException
This static method will attempt to instantiate an appropriate implementation of the FSA for the version found in file given in the input argument.

Throws:
java.io.IOException - An exception is thrown if no corresponding FSA parser is found or if the input file cannot be opened.

readHeader

protected void readHeader(java.io.DataInput in,
                          long fileSize)
                   throws java.io.IOException
Reads a FSA header from a stream.

Throws:
java.io.IOException - If the stream is not a dictionary, or if the version is not supported.

readFully

protected byte[] readFully(java.io.InputStream stream)
                    throws java.io.IOException
Reads all bytes from an input stream.

Parameters:
stream -
Returns:
Returns an array of read bytes.
Throws:
java.io.IOException