State machines are old technology, but they have many uses even for modern programmers. Probably the most common reason to build a state machine on .NET is emulating coroutines in recursive IEnumerator
, such as one that must return a FileSystemInfo
for every file and directory under a starting directory. Since this only takes three states, that's exactly what the sample code for this article (
The basic idea is that you execute a state machine iteratively, when you need to extract the next value from an input stream, or when a new input comes in. The Current
state controls which code is executed on each iteration; the Current
state's handler may or may not change the Current
state. If the Current
state handler doesn't change the Current
state, the same handler is invoked each time the state machine is iterated, until something changes.
The canonical way to implement a state machine is via an enum
containing the name of every state, and a giant switch
statement containing the handler for each enum. This compiles to a nice efficient jump table, but it's hard to maintain. When you add a new state, you have to change both the enum
and the switch
statement; plus, tracing execution means lots of jumping around the switch
statement.
After writing an IEnumerator
as an enum
and switch
state machine, I got to wondering: if an object is just a state packet, could you swap state by changing the CurrentState
object, instead of the CurrentEnum
value? Each state's handler would stand by itself as a standard method, which is easier to read than a long switch
statement. Adding a new state just means adding a new class, which is easier than editing both the enum
and the switch
. And when a handler changes the CurrentState
, you can use F12 to jump to the implementation of the next state, which is easier than having to find it by hand in a long switch
statement.
The interface between IEnumerator
and a state machine is pretty obvious: Reset()
sets CurrentState
to an initial state. MoveNext()
iterates the state machine, either setting an internal CurrentValue
field and returning true, or returning false to indicate that there are no (more) items. And the Current
property exposes the internal CurrentValue
field.
Each state handler needs to be able to change the enumerator's private CurrentState
field. That is, we need a circular reference where the state machine refers to the current state object, and the state handler has a reference back to the state machine.
A Simple Implementation
A simple implementation gives the The only problem, here, is that each different enumeration needs to build the state machine infrastructure all over again: reusability demands looser coupling. Similarly, you could implement the current state handler as a virtual method, but this requires that all state classes descend from a common, abstract ancestor. While this is probably fine in at least 99.9% of all cases, using interfaces presents a bit more flexibility. Thus, StateMachines.cs includes an abstract The abstract This loops until a state handler sets Deliver to true. In my loosely-coupled, object-oriented implementation, the abstract That is, The actual state classes are nested two deep in the The start state is Thus, calling Enumerator.MoveNext() after a This first creates a The The As you can see, Enumerator
object a private reference to a CurrentState
object; each CurrentState
object is an instance of a private class, nested within the Enumerator
class, that has a reference to the Enumerator
instance so that it can change the CurrentState.Enumerator
class which expects state handler classes to support IStateHandler
:public interface
IStateHandler
{
bool MoveNext(IStateMachine StateMachine) ;
}Enumerator
class supports IEnumerator
via a generic state machine:private
IStateHandler
CurrentState;
private object CurrentValue
;
protected abstract IStateHandler
StartState() ;
public void Reset()
{
CurrentState
= /*abstract*/ StartState();
CurrentValue
= null;
}
public object Current
{
get { return CurrentValue
; }
}
public bool MoveNext()
{
return CurrentState.MoveNext(this);
}Reset()
sets an initial state handler, and MoveNext()
iterates the state machine by calling the current state's handler. Now, sometimes you have to cycle the state machine through a few iterations before you get to an output state. For example, a given directory may have subdirectories but no files, or you may have just processed the last subdirectory of a last subdirectory, and so on. In an enum
and switch
implementation, this might be done with code like this:bool Deliver = false;
do
switch (CurrentState)
{
}
while (! Deliver);Enumerator
class supports the IStateMachine
interface, which provides state classes with a couple of utilities: bool Yield()
sets the internal CurrentValue
(and, optionally, the CurrentState) and returns true; bool Recurse()
sets the CurrentState, and recursively returns whatever the new state's IMoveNext
implementation returns:bool IStateMachine.Yield(object NewValue)
{
CurrentValue = NewValue;
return true;
}
bool IStateMachine.Yield(object NewValue, IStateHandler
NextState)
{
CurrentValue = NewValue;
CurrentState = NextState;
return true;
}
bool IStateMachine.Recurse(IStateHandler NextState)
{
if (NextState == null)
return false; // no next state, no next item
CurrentState = NextState;
return NextState.MoveNext(this); // recurse
}return Yield()
sets a new CurrentValue
and suspends the state machine; return Recurse()
passes control to a new state handler (which may itself either Yield()
or Recurse()
) and passes the result back out to the "outer process."AllFiles
class, which implements IEnumerable
by returning a new instance of the nested private class DirectoryEnumerator
: Enumerator.DirectoryEnumerator
overrides StartState()
and contains the actual state handler classes, ShowDirectory, ShowFiles
, and ShowSubdirectories
.ShowDirectory
, with a null next state, and the DirectoryInfo
passed to the AllFiles
constructor:protected override
IStateHandler
StartState()
{
return new ShowDirectory(null, RootDirectory);
}Reset()
calls the ShowDirectory state handler:public bool MoveNext(IStateMachine StateMachine)
{
IStateHandler
subdirectories =
new ShowSubdirectories(NextState, Directory.GetDirectories());
IStateHandler
files =
new ShowFiles(subdirectories, Directory.GetFiles());
// set Current to this directory; next time, scan files
return StateMachine.Yield(Directory, files);
}ShowSubdirectories
state that "returns" to the ShowDirectory
state's NextState. Then, it creates a ShowFiles
state that "forwards" to the ShowSubdirectories
state. Finally, it sets the CurrentValue
to Directory and returns true
.ShowFiles
constructor calls GetEnumerator()
on its second, IEnumerable
parameter, and ShowFiles.MoveNext()
either steps this array enumerator or passes control to the next, ShowSubdirectories
state:public bool MoveNext(IStateMachine StateMachine)
{
if (ArrayEnumerator.MoveNext())
return StateMachine.Yield((FileInfo) ArrayEnumerator.Current);
else
return StateMachine.Recurse(NextState);
}ShowSubdirectories
constructor similarly calls GetEnumerator()
on its second, IEnumerable
parameter, and ShowSubdirectories.MoveNext()
either steps this array enumerator and recursively sets the NewDirectory
state to iterate the subdirectory, or "returns" to the parent directory state, which may be null:public bool MoveNext(IStateMachine StateMachine)
{
if (ArrayEnumerator.MoveNext())
// at least one more subdirectory - recurse
return StateMachine.Recurse(
new ShowDirectory(this,
(DirectoryInfo) ArrayEnumerator.Current));
else
// no (more) subdirectories - pop
return StateMachine.Recurse(NextState);
}ShowSubdirectories
is a strictly internal state that never yields control to the "outer process." It either invokes the ShowDirectory
state with itself as the parent state, or it returns control to its own parent.
Now You Can Take
State machines often require more state data than simply which handler to invoke next. For example, the whole point of One thing you don't need to do is to build an explicit state stack, as you often need to do in an an Overall, I find that object-oriented state machines are a lot easier to read and debug than their While implementing a state machine via state objects is a bit more expensive than using a IEnumerator
is the CurrentValue
. In a one-off state machine, state handlers might be nested within the class that implements IEnumerator
, and can thus freely access private fields and private properties. In a more loosely-coupled implementation, you can do as I've done, and build access to the extra state variables into the handlers' interface to the state machine.enum
and switch
implementation. Each state class contains a NextState
field, which can be a "return address" (as in the ShowDirectory
and ShowSubdirectories
states) or a "forwarding address" (as in the ShowFiles
state). In effect, the state classes maintain a sort of distributed stack, in the linked list of ShowDirectory state objects.enum
and switch
equivalents. While there's a certain amount of programming overhead involved in creating a class for each state, this seems to me to be roughly comparable to the amount of overhead involved in updating both an enum
and a switch.switch
statement, I don't think this matters all that much since most uses of state machines focus on their ability to maintain context across multiple invocations, and speed is not a primary concern.
No comments:
Post a Comment