Skip to content

Formal Languages and Automata (Intro)

Why Formal Languages and Automata?

  • A study of abstract models of computers and computation.
  • Highly used in computer-related career. (e.g., RegEx, network protocols, electronic circuits, etc. )

How to do that?

  • Context-free grammars are used to describe the syntax of essentially every programming language.
  • Automata theory gives you the tools to developing solutions to real problems.

Limit of Computation

Software developing is often confronted with the problem of complexity. Formal languages and automata theory can help you to understand the limits of computation. Here are two important concepts:

  • Undecidable things – no program whatever can do it.
  • Intractable things – there are programs, but no fast programs.

Preliminaries

Elementary Mathematical Knowledge

  • Basic set theory: union, intersection, difference, complement, DeMorgans Laws, subsets, set cardinality (the size of set), Cartesian product, power set (the set of all subsets), etc. (Refer to Set Theory in mathematical logic.)
  • Relations: reflexive, symmetric, transitive, equivalence relations, functions, partial orders, total order, linear order, etc. (Refer to Relations in mathematical logic.)
  • Complexity: O and Ω notations, etc.
  • Graph theory: directed and undirected graphs, reachability, trees, etc.
  • proof techniques: proof by induction, proof by contradiction, etc.

Languages

A language is a set of strings. A string is a sequence of symbols. Symbols are defined over an alphabet. Alphabet is a set of symbols, often denoted by Σ. For example, Σ={0,1} is the binary alphabet.

We have the following string operations:

  • Concatenation: w1w2 is the concatenation of strings w1 and w2.
  • Power: wn is the concatenation of n copies of string w. E.g., (abc)3=abcabcabc.
  • Reversal: wR is the reversal of string w. E.g., (abc)R=cba.
  • Get Length: |w| is the length of string w. Obviously, we have|w1w2|=|w1|+|w2|

We need to define the empty string λ(ε), which is the string with length 0. It shows up in many places in formal languages. Note that: 1. when we concate strings with empty string, we don't actually generate new string. 2. when we calculate the power of a string, we regard w0=λ.

Substring of a string is a subsequence of consecutive characters in that string. For example, "abc" is a substring of "abcde". From this definition, we can easily define the prefix and suffix of a string. For example, "ab" is a prefix of "abcde", and "de" is a suffix of "abcde".

Question

Given the equation 011x=x011, find x.

Solution:

  • If x=λ, then ok.
  • If |x|=1, then no solution.
  • If |x|=2, then no solution.
  • If |x|>3, then x=011y. Hence, 011x=011y011. So, x=y011. Therefore, 011y=y011.

We have the following operations on Alphabets:

  • -operation: Σ is the set of all strings over Σ, including the empty string λ.
  • +-operation: Σ+ is the set of all strings over Σ except the empty string λ.

Language is a set of strings, is any subset of Σ.

We have the following operations on Languages:

  • Usual set operations: union, intersection, difference, complement, etc.
  • Concatenation: L1L2={w1w2w1L1,w2L2}. Like concatenation in symbols, we write Ln to denote the concatenation of n copies of L.

Note that...

Suppose that L={anbn|n>0}, then the L2 is NOT {anbnanbn|n>0}, but {anbnambm|n,m>0}.

  • Star-Closure: L=i=0Li. It is the set of all strings that can be formed by concatenating zero or more strings from L, or may say, the "dictionary" of L.