morfologik.fsa.core
Class FSAVer5Impl

java.lang.Object
  extended by morfologik.fsa.core.FSA
      extended by morfologik.fsa.core.FSAVer5Impl

public final class FSAVer5Impl
extends FSA

FSA (Finite State Automaton) dictionary traversal implementation for version 5 of the FSA automaton.

Version 5 indicates the dictionary was built with these flags: FSA.FSA_FLEXIBLE, FSA.FSA_STOPBIT and FSA.FSA_NEXTBIT. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

 Byte
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/

                  +------------- node pointed to is next
                  | +----------- the last arc of the node
                  | | +--------- the arc is final
                  | | |
             +-----------+
             |    | | |  |
         ___+___  | | |  |
        /       \ | | |  |
       MSB           LSB |
        7 6 5 4 3 2 1 0  |
       +-+-+-+-+-+-+-+-+ |
     1 | | | | | | | | | \ \
       +-+-+-+-+-+-+-+-+  \ \  LSB
       +-+-+-+-+-+-+-+-+     +
     2 | | | | | | | | |     |
       +-+-+-+-+-+-+-+-+     |
     3 | | | | | | | | |     +----- target node address (in bytes)
       +-+-+-+-+-+-+-+-+     |      (not present except for the byte
       : : : : : : : : :     |       with flags if the node pointed to
       +-+-+-+-+-+-+-+-+     +       is next)
   gtl | | | | | | | | |    /  MSB
       +-+-+-+-+-+-+-+-+   /
 gtl+1                           (gtl = gotoLength)
 

The traversal of such FSA could be made extremely fast using pointers only (or integer indices over a byte array in case of Java). However, for the sake of clarity, this class uses explicit Node/ Arc objects.


Field Summary
protected  byte[] arcs
          An array of bytes with the internal representation of the automaton.
protected  int arcSize
          Size of a single arc (in bytes).
protected static int gotoOffset
          An offset in the arc structure, where the address field begins.
 
Fields inherited from class morfologik.fsa.core.FSA
filler, FSA_FLEXIBLE, FSA_LARGE_DICTIONARIES, FSA_NEXTBIT, FSA_STOPBIT, FSA_TAILS, FSA_WEIGHTED, gotoLength, version, VERSION_5
 
Constructor Summary
FSAVer5Impl(java.io.InputStream fsaStream, java.lang.String dictionaryEncoding)
          Creates a new automaton reading it from a file in FSA format, version 5.
 
Method Summary
 int getNumberOfArcs()
          Returns the number of arcs in this automaton.
 int getNumberOfNodes()
          Returns the number of nodes in this automaton.
 FSA.Node getStartNode()
          Returns the start node of this automaton.
protected  int gotoFieldToOffset(int start, int n)
          Returns an integer offset from bitpacked representation.
protected  void readHeader(java.io.DataInput in, long fileSize)
          Reads a FSA header from a stream.
 
Methods inherited from class morfologik.fsa.core.FSA
getAnnotationSeparator, getFillerCharacter, getFlags, getInstance, getInstance, getTraversalHelper, getVersion, readFully
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

arcSize

protected int arcSize
Size of a single arc (in bytes).


gotoOffset

protected static final int gotoOffset
An offset in the arc structure, where the address field begins. For this version of the automaton, this is a constant value.

See Also:
Constant Field Values

arcs

protected byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.

Constructor Detail

FSAVer5Impl

public FSAVer5Impl(java.io.InputStream fsaStream,
                   java.lang.String dictionaryEncoding)
            throws java.io.IOException
Creates a new automaton reading it from a file in FSA format, version 5.

Throws:
java.io.IOException
Method Detail

getNumberOfArcs

public int getNumberOfArcs()
Returns the number of arcs in this automaton. This method performs a full scan of all arcs in this automaton.

Specified by:
getNumberOfArcs in class FSA

getNumberOfNodes

public int getNumberOfNodes()
Returns the number of nodes in this automaton. This method performs a full scan of all arcs in this automaton.

Specified by:
getNumberOfNodes in class FSA

getStartNode

public FSA.Node getStartNode()
Returns the start node of this automaton. May return null if the start node is also an end node.

Specified by:
getStartNode in class FSA

readHeader

protected void readHeader(java.io.DataInput in,
                          long fileSize)
                   throws java.io.IOException
Reads a FSA header from a stream.

Overrides:
readHeader in class FSA
Throws:
java.io.IOException - If the stream is not a dictionary, or if the version is not supported.

gotoFieldToOffset

protected final int gotoFieldToOffset(int start,
                                      int n)
Returns an integer offset from bitpacked representation.