Skip to content

Regular Expressions RegEx

Regular expressions describe languages based on algebra operations. The language defined by regular expression w is L(w), which is exactly regular language.

Regular Expression Definition (Bases)

A regular expression is defined as follows:

  • Basis1: If a is a symbol in the alphabet, then a is a regular expression and L(a) represents the language {a}.
  • Basis2: ϵ is a regular expression and L(ϵ) represents the language {ϵ}.
  • Basis3: is a regular expression and L() represents the language .

Regular Expression Operators (Inductions)

The regular expression operators are defined as follows:

  • Induction 1 (Union): If R1 and R2 are regular expressions, then R1+R2 is also a regular expression. R1+R2 is the set of strings that are in R1 or R2.
  • Induction 2 (Concatenation): If R1 and R2 are regular expressions, then R1R2 is also a regular expression. R1R2 is the set of strings that are the concatenation of a string in R1 followed by a string in R2.
  • Induction 3 (Kleene Star): If R is a regular expression, then R is also a regular expression. R is the set of strings that are the concatenation of zero or more strings in R.

Precedence of Operators

The precedence of operators is as follows:

  • Kleene Star has the highest precedence.
  • Concatenation has the second highest precedence. The concatenation operator is implicit, so it is not necessary to write it.
  • Union has the lowest precedence.

Regular Expression Examples

  • Example 1: L((0+1)) represents the set of all binary strings.
  • Example 2: L((0+10)(1+ε)) represents the set of all binary strings that have no two consecutive 1s.

Equivalence of Regular Expressions and Finite Automata

Regular expressions and finite automata are equivalent. A regular expression can be easily converted to a NFA by using Thompson's construction method. For any regular expression R, there exists an NFA N(R) such that L(R)=L(N(R)). We can construct the corresponding NFA by following these rules:

  • Basic Rules

    • If the regular expression is a symbol a, then it is converted to an NFA with two states.

    R is only a symbol 'a'

    • If the regular expression R is ε, then it is converted to an NFA with two state, linked by an ϵ-transition.

    R is epsilon

    • If the regular expression is , then it is converted to an NFA with two states, but without any transitions.
  • Inductive Rules

    • If the regular expression is s+t, then the NFA is like a choice between the NFAs for s and t.

    R is s + t

    • If the regular expression is st, then the NFA sequentially combines the NFAs for s and t.

    R is s.t

    • If the regular expression is s, then the NFA for s is a loop that can be taken zero or more times.

    R is s*