# Finite automata 1: Automatic sets

In one of my recent courses I had to learn about finite automata theory. The main purpose was to understand a generalisation of the celebrated Skolem-Mahler-Lech theorem in positive characteristic. Let $\mathbb N_0$ denote the set of non-negative integers and $K$ be a field. For a linear recurrence $f:\mathbb N_0\to K$, define its zero set to be $\{n\in\mathbb N_0 : f(n)=0\}$. Then the Skolem-Mahler-Lech theorem states the following:

Theorem (Skolem-Mahler-Lech, 1953). The zero set of a linear recurrence $f:\mathbb N_0\to\mathbb K$, where $K$ is a field of characteristic $0$, is a finite union of arithmetic progressions along with a finite set.

Derksen in his 2005 paper was able to provide the correct analogue of this result in positive characteristic for the first time:

Theorem (Derksen, 2007). The zero set of a linear recurrence $f:\mathbb N_0\to K$, where $K$ is a field of characteristic $p>0$, is a $p$-automatic set.

To understand this theorem we need to know about finite automata. In this and the next few posts I want to write about some interesting elementary results about automatic sets. Since my knowledge of computer science is epsilon(!), I will mostly touch on some very basic mathematical aspects.

A deterministic finite-state automaton (DFA) is a directed multigraph $\Gamma$ satisfying certain properties. Firstly, its vertex-set $Q$ (which we call the set of states) contains a special vertex (or state) $q_0$, called the initial state. We also have a finite non-empty set $\Sigma$ of symbols. Each edge of $\Gamma$ is labelled with a symbol from $\Sigma$. As a result, we have a transition function $\delta:Q\times\Sigma\to Q$ such that if an edge from $q_1$ to $q_2$ is labelled with the symbol $a\in\Sigma$, then we write $\delta(q_1,a)=q_2$. Finally, we have a set $F\subseteq Q$ (possibly empty) called the set of accepting or final states.

So $\Gamma$ can be formally written as a quintuple $(\Sigma,Q,q_0,\delta,F)$.

Let $\Sigma^*$ be the free monoid on $\Sigma$. We can extend the transition function $\delta$ to a map $\delta:Q\times\Sigma^*\to Q$ by setting $\delta(q_0,wa)=\delta(\delta(q_0,a),w)$ for all $a\in\Sigma$ and $w\in\Sigma^*$. Thus, to $\Gamma$ we can associate a language

$\mathcal L_\Gamma:=\{w\in\Sigma^*:\delta(q_0,w)\in F\}\subseteq\Sigma^*$.

We call such a language produced from a DFA a regular language. If $w\in\mathcal L_\Gamma$ for some $w\in\Sigma^*$, we say that $w$ is accepted by $\Gamma$.

Let $\mathbb N$ denote the set of the positive integers, and let $k>1$ be an integer. Define $\Sigma_k:=\{0,1,\dots,k-1\}$. A subset $S\subset\mathbb N$ is said to be $k$-automatic if there exists a DFA $\Gamma=(\Sigma_k,Q,q_0,\delta,F)$ such that $S$ is precisely the set of those $j\in\mathbb N$ whose base-$k$ representation, when regarded as a string in $\Sigma^*$, is accepted by $\Gamma$. If $S\subseteq\mathbb N$ is $k$-automatic for some $k>1$, we say $S$ is an automatic set.

For $w=w_nw_{n-1}\cdots w_0\in\Sigma_k^*$ with each $w_i\in\Sigma_k$ we define

$[w]_k:=w_nk^n+w_{n-1}k^{n-1}+\cdots+w_0.$

In this case we also define $|w|:=n+1$ to be the length of $w$.

So $S\subseteq\mathbb N$ is $k$-automatic precisely when $S=\{[w]_k:w\in\mathcal L_\Gamma\}$ for some DFA $\Gamma$.

Note that we can set $\delta(q,0)$ to be in $Q\backslash F$ for every $q\in Q$ (if $F=Q$, just create a new state). Thus we may without loss of generality assume that all our DFAs only accept words without leading zeros.

A basic result about regular languages is the pumping lemma:

Lemma (Pumping lemma). If $\mathcal L$ is an infinite regular language, then there exist words $a,b,c$ with $|b|\ge 1$ such that $ab^nc\in\mathcal L$ for all $n\ge 0$.

Proof. Since $\mathcal L$ is infinite, there are words of arbitrarily large length in $\mathcal L$. Let $w=w_nw_{n-1}\cdots w_0\in\mathcal L$ be a word of sufficient length. Then there exist $0\le i such that $w_i\cdots w_0=w_j\cdots w_0\in\mathcal L$. Taking $a=w_n\cdots w_j$, $b=w_{j-1}\cdots w_i$, $c=w_{i-1}\cdots w_0$ shows that $ab^nc\in\mathcal L$ for all $n\ge 0$. Moreover, since $i, it follows that $|b|\ge 1$. $\square$

In the following posts we are going to use the pumping lemma, together with other machinery, to derive some interesting results on the automaticity of certain sets.