A Minimal Turing Machine

Epistemic status: likely.

Tagged in: programming, turing machines, mathematics

Now that this blog has lain dormant for many months, there can’t possibly anyone left to object to a complete change of pace.  This was my plan all along — honest.

One of the many things that has distracted my attention from blogging is that I have, at long last, gotten into a rather entrenched habit of programming.  I do enough now that I take it completely for granted — it’s downright uncanny to look back at some of my posts and see me wanting to start programming again.

But that’s enough useless preambling (worse, useless, navel-gazing preambling), let us move onto the actual subject of this post: Turing Machines.


I don’t think I have to explain to anyone reading this what a Turing Machine is.  I don’t want to, at any rate.

I have, recently, taken an interest in meta-mathematics (or rather, re-kindled my interest — it’s always been a hobby horse of mine).  It has occasionally crossed my mind that I could use my adequate programming ability to help myself understand these results — what better way to prove something than to build it yourself?  Proofs are programs and programs are proofs, after all.


Let’s get things moving.

My first instinct upon deciding to implement this is that I don’t have a precise model of what moving parts constitute a Turing Machine specifically.  To be sure, I checked the Wikipedia page, and saw:

Following Hopcroft and Ullman (1979, p. 148), a (one-tape) Turing machine can be formally defined as a 7-tuple M = ⟨ Q, Γ, b, Σ, δ, q0, F⟩ where

  • is a finite, non-empty set of states
  • Γ is a finite, non-empty set of tape alphabet symbols
  • b ∈ Γ is the blank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation)
  • Σ ⊆ Γ is the set of input symbols
  • δ : (Q∖F)×Γ → Q×Γ×{L,R}  is a partial function called the transition function, where L is left shift, R is right shift. (A relatively uncommon variant allows “no shift”, say N, as a third element of the latter set.) If δ is not defined on the current state and the current tape symbol, then the machine halts.
  • q0 ∈ Q is the initial state
  • F ⊆ Q is the set of final or accepting states. The initial tape contents is said to be accepted by M if it eventually halts in a state from F.

Anything that operates according to these specifications is a Turing machine.

Having read a few computer science papers, I’m familiar with this interesting style of description mathematicians prefer.  Verbose and yet austere at the same time.

For me, it helps to translate these one-letter greek names into something a little more conventionally meaningful.

enum alphabet {

enum shift {

typedef int state;

struct transition {
    state         new_state;
    enum alphabet new_symbol;
    enum shift    shift;

struct turing {
    struct state *states;
    enum alphabet *input;
    struct state *initial;
    struct state *final;
    struct transition control[][alphabet_len];
  • enum alphabet is the alphabet, Γ (“gamma”).  Here we have three symbols, blank is the blank symbol, and symbol0 and symbol1 are as you expect.  You can ostensibly to this using only two symbols, but we aren’t going to be using this code anyway.   Lastly, alphabet_length is the length of the alphabet.  C hands out values sequentially to enums, hence symbol0 and symbol1 shall be 0 and 1, blank shall be 2, and then alphabet_len is 3 — this is useful when we want an array to hold a value for each possible symbol, as we do in struct state
  • enum shift is the directions the tape head can move in: left, right, and stay (“no shift” in the article).  The Wikipedia page doesn’t make this set a first class citizen (most likely as there are only two, maybe three elements), but it makes the most sense, here.
  • state is how we represent states.  They are just indexes to the .control array.
  • struct transition is how state transitions are represented.  new_state is the state transitioned to, new_symbol is the symbol written onto the tape when this transition is activated, and shift is how the tape head should shift after writing the symbol.
  • Finally, the most important element is struct turing itself.  I think much of this speaks for itself — .states is Q, .input is Σ (“sigma”), .initial is q0, .final is F.  .control is our version of the δ (“delta”) transition function.  It’s an array indexed by states.  The elements are alphabet_len many transitions, one for each alphabet symbol.  Yes, this is a function in the article, but when mathematicians say ‘function’ what they mean is ‘mapping’, and an array is the most literal translation of that.


Mathematical notation is rather compact, if it blows up to that.  But my translation above suggests what  I decided as soon as I saw it — these mathematicians have no sense of minimalism.

So the question becomes, what does a minimal Turing Machine look like?  Is the complexity of the usual definition necessary, or incidental?

We aren’t looking for the smallest Turing-complete formalism, that would take us too far afield.  So, if we stick to something that vaguely resembles this construct, what seems most important?

  • states and transitions
  • a tape

That’s about it.

A Turing Machine should be able to have arbitrarily many states, and an arbitrarily long tape.  That’s not possible, but our implementation should be bound only by physical limits.

The first consequence at that is that the tape should grow as the machine uses it, and the most natural data structure for representing this is the doubly-linked list, where each cell of the tape holds the alphabet symbol and points the next and previous cell.  If one pointer is the null pointer, we know we are at the last cell at either end of the tape.  Adding a new cell to the tape then becomes as simple as letting the first or last node of the tape point to the new cell.

At first, I considered something similar for states themselves — implement them as a binary graph, each state pointing to both of its transitions.  However, the number of states doesn’t grow as we run the machine, so a linked data structure doesn’t offer many advantages.  Plus, it’s actually simpler to use a transition table than a graph anyway — something I learned failing to implement the former and settling on the later.

I mentioned earlier that you don’t need more than two symbols to have a functioning Turing machine — this adds up to just a single bit to represent symbols. It doesn’t get more minimal than that.

There is one problem with that: modern computers, most notably the x86{,_64} CPU you most likely are reading this on, only have addressing as coarse as a single byte.  And if we stored symbols as bytes, then we’re wasting seven bits.

Actually, because of alignment, you’re wasting up to seven bytes.  Most modern computers have datatypes aligned to their size in octets — a four byte datatype must be written/read to/from addresses that are multiples of four.  On x86, pointers are four bytes (32 bits), and on x64 pointers are eight bytes (64 bits).  A byte is, well, a single byte, so they can be read from any address.

Tying this all together, a struct like this

struct silly_tape {
    char symbol;
    struct silly_tape *next;

really looks like

struct silly_tape_exposed {
    char symbol;
    char _slop[7];
    struct silly_tape_exposed *next;

(assuming you’re on x64)

Structs are aligned to their highest-aligned member, so .symbol will be on an address divisible by 8. But .next must also be on a address divisible by 8, so there are seven bytes of padding added to make .next properly aligned.

Can we do better than this?


The way around this is a bit of trickery which isn’t quite standards-conforming, but works on most sane platform.  This is storing the pointer as a special type called uintptr_t, read as ‘unsigned integer pointer type’ (there’s also intptr_t, but nobody likes signed types).

Because pointers must be aligned to four or eight bytes, the lower two bits are going to be unset no matter what.  Visually:

Hex Address: 0x606034
In Binary:   11000000110000000110100

In binary, 100 is four, so pointer addresses are going to be four times something — the lower two bits are always zero.

Since uintptr_t is just an integer, we can mask in the lower bits (ptr & sym) to “tag” the pointers with metadata (in this case, the symbol of the alphabet), and then mask out those numbers (ptr & ~3) before we follow the pointer.  Note that there must be four or less symbols if you’re going to try this, otherwise you’ll clobber some address bits on x86.

On small caveat you perhaps noticed — even if we’re supporting 32-bit platforms, there’s still an extra bit we aren’t using.  Why not use a four symbol Turing machine? It halves the size of tapes, and even though it doubles the size of state transitions, you’ll tend to have more of the former than the later.

Honestly, this was an oversight on my part.  That said, I do like a conceptual simplicity of a two-symbol Turing machine, though I may rework my design to use four symbols in the future.


This is one other bit of theory to mention.  As it currently stands, our tape needs two pointers, one pointer leftward, one pointing rightward. Also known as a ‘doubly-linked list’.

There is a bit of redundancy in this scheme, though. If A points to B, then we already know B points back to A.  So a tape of n cells requires n*2 space, and yet only contains n+1 information (n-1 information from links between cells, 2 information from the two cells at the ends).  That isn’t very good.

We aren’t the first ones to think of this, however.  You’ll occasionally see mention of xor-linked lists, which take advantage of this redundancy.  Links are stored as the exclusive-or of the (integer representation of) the left and right pointers.  As you walk the list, you must keep track of the last cell you saw.  Then, you can xor it again, and get the next pointer you’re looking for.

Applying this to our (so far hypothetical) implementation, and we reduce the size of a cell to just four/eight bytes.  That isn’t bad.

Caveat: you’ll see it said that xor-linked lists are a clever trick that you should be wary of using, as it’s most likely premature optimization & more complex than just doing things the straight-forward way.

My excuse is twofold,

  1. Optimization is about eliminating bottlenecks.  If you have a 70 byte doubly linked struct and decide to switch to xor-links, you now have a 62 byte struct.  That’s only about a 12% decrease in size, and so the links shouldn’t have be the first place you looked.  The difference is, here our data is literally neglible, and this trick reduces the size of the tape by half.  This is a much more respectable increase.
  2. Xor-linked lists are fun to implement!  This project is educational, after all.


The last concern is instructions.  The article had instructions as a (new state, symbol to write, direction to move) triplet.  There is a justification, as far as I can tell, for not including ‘no write’ or ‘no move’ pairs.  ‘No write’ can be easily emulated by writing the symbol that is already present.  ‘No move’ is pointless — it hands control off to another state, but there is no need for this, as you can copy the relevant transition from that state and avoid the indirection.

This means we only need two bits for instructions.

One departure from this model in our implementation is we replace the write instruction with ignore/invert, which either leaves the cell unchanged, or writes the opposite symbol in it.  This simplifies code generation, allowing us to initialize all transitions to the halting transition by assigning a single value to them all.


Here is our scheme:

  • There are two alphabet symbols, 0 & 1.
  • The tape is a xor-linked list. The lower bit of the link is the symbol of the current cell of the tape.
  • A state is an index into the transition matrix.
  • The initial state is 0.
  • The final, halting state is -1.
  • The transition matrix itself is a array of 2 (state, instruction) pairs.

A few other things worth mentioning:

  • We could get massive space savings by unrolling the tape lists.  This can be accomplished by letting the lowest order bit determine whether this should be interpreted as an address or a pure bitvector.  If it’s a bitvector, then we access the next address in memory and preform the same check.  If we unroll a tape five times, then we have 256 tape cells in 2% of the space.  This is excellent, but just complex enough to not make it into this first implementation.  This also doesn’t play nice with xor-linked lists, but that isn’t too big of a loss.
  • Turing machines effectively require three symbols — the main way of getting a two-symbol Turing machine to work is to encode those three symbols — 11 becomes 1, 10 becomes 0, & 0 is b.  This would be a nice idea if you don’t plan on unrolling your tape lists, as it’s free space.  If you do unroll them, the burden of space is just moved from the machine to the implementation.  This also applies to the four symbol machine mentioned earlier.
Those interested in a complete implementation of this idea can read the code in the minimal branch of this git repo.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s