Maybe you haven't worked with state machines since your college computer science courses. Jon Shemitz offers a reason to dust off the technique with .NET: object-oriented state machines can be easier to read and debug than their enum and State machines are old technology, but they have many uses even for modern programmers. Probably the most common reason to build a state machine on .NET is emulating coroutines in recursive IEnumerator
, such as one that must return a FileSystemInfo
for every file and directory under a starting directory. Since this only takes three states, that's exactly what the sample code for this article (StateMachines.zip) can do. In Whidbey, iterators will eliminate the need to emulate coroutines, but Whidbey may still be as much as a year out—and even when Whidbey arrives, there will still be uses for state machines in .NET programming.
The basic idea is that you execute a state machine iteratively, when you need to extract the next value from an input stream, or when a new input comes in. The Current
state controls which code is executed on each iteration; the Current
state's handler may or may not change the Current
state. If the Current
state handler doesn't change the Current
state, the same handler is invoked each time the state machine is iterated, until something changes.
The canonical way to implement a state machine is via an enum
containing the name of every state, and a giant switch
statement containing the handler for each enum. This compiles to a nice efficient jump table, but it's hard to maintain. When you add a new state, you have to change both the enum
and the switch
statement; plus, tracing execution means lots of jumping around the switch
statement.
After writing an IEnumerator
as an enum
and switch
state machine, I got to wondering: if an object is just a state packet, could you swap state by changing the CurrentState
object, instead of the CurrentEnum
value? Each state's handler would stand by itself as a standard method, which is easier to read than a long switch
statement. Adding a new state just means adding a new class, which is easier than editing both the enum
and the switch
. And when a handler changes the CurrentState
, you can use F12 to jump to the implementation of the next state, which is easier than having to find it by hand in a long switch
statement.
The interface between IEnumerator
and a state machine is pretty obvious: Reset()
sets CurrentState
to an initial state. MoveNext()
iterates the state machine, either setting an internal CurrentValue
field and returning true, or returning false to indicate that there are no (more) items. And the Current
property exposes the internal CurrentValue
field.
Each state handler needs to be able to change the enumerator's private CurrentState
field. That is, we need a circular reference where the state machine refers to the current state object, and the state handler has a reference back to the state machine.
A Simple Implementation
A simple implementation gives the Enumerator
object a private reference to a CurrentState
object; each CurrentState
object is an instance of a private class, nested within the Enumerator
class, that has a reference to the Enumerator
instance so that it can change the CurrentState.
The only problem, here, is that each different enumeration needs to build the state machine infrastructure all over again: reusability demands looser coupling.
Similarly, you could implement the current state handler as a virtual method, but this requires that all state classes descend from a common, abstract ancestor. While this is probably fine in at least 99.9% of all cases, using interfaces presents a bit more flexibility.
Thus, StateMachines.cs includes an abstract Enumerator
class which expects state handler classes to support IStateHandler
:
public interface IStateHandler
{
bool MoveNext(IStateMachine StateMachine) ;
}
The abstract Enumerator
class supports IEnumerator
via a generic state machine:
private IStateHandler
CurrentState;
private object CurrentValue
;
protected abstract IStateHandler
StartState() ;
public void Reset()
{
CurrentState
= /*abstract*/ StartState();
CurrentValue
= null;
}
public object Current
{
get { return CurrentValue
; }
}
public bool MoveNext()
{
return CurrentState.MoveNext(this);
}
Reset()
sets an initial state handler, and MoveNext()
iterates the state machine by calling the current state's handler. Now, sometimes you have to cycle the state machine through a few iterations before you get to an output state. For example, a given directory may have subdirectories but no files, or you may have just processed the last subdirectory of a last subdirectory, and so on. In an enum
and switch
implementation, this might be done with code like this:
bool Deliver = false;
do
switch (CurrentState)
{
}
while (! Deliver);
This loops until a state handler sets Deliver to true.
In my loosely-coupled, object-oriented implementation, the abstract Enumerator
class supports the IStateMachine
interface, which provides state classes with a couple of utilities: bool Yield()
sets the internal CurrentValue
(and, optionally, the CurrentState) and returns true; bool Recurse()
sets the CurrentState, and recursively returns whatever the new state's IMoveNext
implementation returns:
bool IStateMachine.Yield(object NewValue)
{
CurrentValue = NewValue;
return true;
}
bool IStateMachine.Yield(object NewValue, IStateHandler
NextState)
{
CurrentValue = NewValue;
CurrentState = NextState;
return true;
}
bool IStateMachine.Recurse(IStateHandler NextState)
{
if (NextState == null)
return false; // no next state, no next item
CurrentState = NextState;
return NextState.MoveNext(this); // recurse
}
That is, return Yield()
sets a new CurrentValue
and suspends the state machine; return Recurse()
passes control to a new state handler (which may itself either Yield()
or Recurse()
) and passes the result back out to the "outer process."
The actual state classes are nested two deep in the AllFiles
class, which implements IEnumerable
by returning a new instance of the nested private class DirectoryEnumerator
: Enumerator.DirectoryEnumerator
overrides StartState()
and contains the actual state handler classes, ShowDirectory, ShowFiles
, and ShowSubdirectories
.
The start state is ShowDirectory
, with a null next state, and the DirectoryInfo
passed to the AllFiles
constructor:
protected override IStateHandler
StartState()
{
return new ShowDirectory(null, RootDirectory);
}
Thus, calling Enumerator.MoveNext() after a Reset()
calls the ShowDirectory state handler:
public bool MoveNext(IStateMachine StateMachine)
{
IStateHandler
subdirectories =
new ShowSubdirectories(NextState, Directory.GetDirectories());
IStateHandler
files =
new ShowFiles(subdirectories, Directory.GetFiles());
// set Current to this directory; next time, scan files
return StateMachine.Yield(Directory, files);
}
This first creates a ShowSubdirectories
state that "returns" to the ShowDirectory
state's NextState. Then, it creates a ShowFiles
state that "forwards" to the ShowSubdirectories
state. Finally, it sets the CurrentValue
to Directory and returns true
.
The ShowFiles
constructor calls GetEnumerator()
on its second, IEnumerable
parameter, and ShowFiles.MoveNext()
either steps this array enumerator or passes control to the next, ShowSubdirectories
state:
public bool MoveNext(IStateMachine StateMachine)
{
if (ArrayEnumerator.MoveNext())
return StateMachine.Yield((FileInfo) ArrayEnumerator.Current);
else
return StateMachine.Recurse(NextState);
}
The ShowSubdirectories
constructor similarly calls GetEnumerator()
on its second, IEnumerable
parameter, and ShowSubdirectories.MoveNext()
either steps this array enumerator and recursively sets the NewDirectory
state to iterate the subdirectory, or "returns" to the parent directory state, which may be null:
public bool MoveNext(IStateMachine StateMachine)
{
if (ArrayEnumerator.MoveNext())
// at least one more subdirectory - recurse
return StateMachine.Recurse(
new ShowDirectory(this,
(DirectoryInfo) ArrayEnumerator.Current));
else
// no (more) subdirectories - pop
return StateMachine.Recurse(NextState);
}
As you can see, ShowSubdirectories
is a strictly internal state that never yields control to the "outer process." It either invokes the ShowDirectory
state with itself as the parent state, or it returns control to its own parent.
State machines often require more state data than simply which handler to invoke next. For example, the whole point of IEnumerator
is the CurrentValue
. In a one-off state machine, state handlers might be nested within the class that implements IEnumerator
, and can thus freely access private fields and private properties. In a more loosely-coupled implementation, you can do as I've done, and build access to the extra state variables into the handlers' interface to the state machine.