[06]

Formal language theory doesn't have to be boring

My fourth semester includes Theory of Computation and Formal Language & Automata. Most people treat these as pure exam topics.

Memorise the pumping lemma, draw the DFA, pass the test, forget it. That seems like a waste. These ideas show up in real code constantly.

DFAs are everywhere

A deterministic finite automaton is just a state machine that consumes input and ends in either an accepting or rejecting state. That's also:

  • A regex engine
  • A URL router
  • A game character's behaviour controller
  • A form validation pipeline
  • A TCP connection state tracker

When I'm designing a multi-step flow in an app — say, an onboarding sequence where certain steps unlock others — I think in DFA terms. States, transitions, terminal states. It makes the design clearer and the bugs more obvious.

NFAs and the subset construction

The non-deterministic finite automaton is where it gets interesting. An NFA can be in multiple states simultaneously — you can think of it as exploring all possible paths at once.

The subset construction algorithm converts any NFA to a DFA. In practice this means: any non-deterministic system can be made deterministic, at the cost of exponential state explosion in the worst case.

I think about this when I'm dealing with concurrent async operations. Multiple things happening at once (NFA) can always be modelled as a single thing tracking all the combinations of what might have happened (DFA). Sometimes that's the right design. Sometimes the state explosion makes it impractical.

The pumping lemma

The pumping lemma is a proof technique for showing that a language is not regular — i.e., it can't be recognised by any finite automaton.

The classic example: balanced parentheses. ((())) is valid. ((( is not. No DFA can recognise this language because DFAs have finite memory and balanced parens require counting nesting depth, which is unbounded.

This shows up in real life: regex cannot parse HTML. Not "shouldn't" — fundamentally cannot, because HTML has nested structure that's not regular. The right tool is a pushdown automaton (what parser generators use) or a full parser with a stack.

Every time someone writes a regex to parse HTML and it breaks on an edge case, they've rediscovered the pumping lemma.

What Peter Linz's textbook gets right

It builds intuition before formalism. The proofs are rigorous but the motivation is always clear. I use the Reverse Feynman Technique with it — try to explain the concept without looking at the book, then check where my explanation breaks down. The gaps in my explanation are the gaps in my understanding.