Regular Expressions
Regular expressions describe languages based on algebra operations. The language defined by regular expression
Regular Expression Definition (Bases)
A regular expression is defined as follows:
- Basis1: If
is a symbol in the alphabet, then is a regular expression and represents the language . - Basis2:
is a regular expression and represents the language . - Basis3:
is a regular expression and represents the language .
Regular Expression Operators (Inductions)
The regular expression operators are defined as follows:
- Induction 1 (Union): If
and are regular expressions, then is also a regular expression. is the set of strings that are in or . - Induction 2 (Concatenation): If R1 and R2 are regular expressions, then
is also a regular expression. is the set of strings that are the concatenation of a string in followed by a string in . - Induction 3 (Kleene Star): If
is a regular expression, then is also a regular expression. is the set of strings that are the concatenation of zero or more strings in .
Precedence of Operators
The precedence of operators is as follows:
- Kleene Star has the highest precedence.
- Concatenation has the second highest precedence. The concatenation operator is implicit, so it is not necessary to write it.
- Union has the lowest precedence.
Regular Expression Examples
- Example 1:
represents the set of all binary strings. - Example 2:
represents the set of all binary strings that have no two consecutive s.
Equivalence of Regular Expressions and Finite Automata
Regular expressions and finite automata are equivalent. A regular expression can be easily converted to a NFA by using Thompson's construction method. For any regular expression
Basic Rules
- If the regular expression is a symbol
, then it is converted to an NFA with two states.
R is only a symbol 'a'
- If the regular expression
is , then it is converted to an NFA with two state, linked by an -transition.
R is epsilon
- If the regular expression is
, then it is converted to an NFA with two states, but without any transitions.
- If the regular expression is a symbol
Inductive Rules
- If the regular expression is
, then the NFA is like a choice between the NFAs for and .
R is s + t
- If the regular expression is
, then the NFA sequentially combines the NFAs for and .
R is s.t
- If the regular expression is
, then the NFA for is a loop that can be taken zero or more times.
R is s*
- If the regular expression is
DFA to
First, we need to define the concept of
The idea of the algorithm is to find the corresponding regular expression for the
Let
- Basis: For
, is the sum of symbols of the arc that directly connect state to state . If there is no arc, then . But if , then add . - Induction: For
, the regular expression is the sum of the regular expressions . We can comprehend this formula as follows:- The
paths from to can be divided into two parts: the paths that do not go through state and the paths that go through state . - The
paths that do not go through state are the same as the paths from to , i.e., . - The
paths that go through state must visit state at least once. Thus, the regular expression for these paths is . The middle term allows the path to loop in state .
- The
Finally, the regular expression for the language of the DFA is
Summary
Each of the three types of automata (DFA, NFA,