Context-Free Grammar
A Context-Free Grammar (CFG) is a notation for describing languages. It is more powerful than regular languages, but still cannot describe all languages. For example, a CFG can describe the language of balanced parentheses, but cannot describe the language of palindromes.
The basic idea of a CFG is to describe a language by using sets of denotations, which are called nonterminal symbols. These nonterminal symbols are combined using production rules to generate strings of terminal symbols. The terminal symbols are the actual symbols of the language. Context-free means that the production rules are applied regardless of the context in which the nonterminal symbols appear.
For example, the language of
Here,
conventions
- Nonterminal symbols are usually written as uppercase letters, such as
. - Terminal symbols are usually written as lowercase letters, such as
.
Formal Definition
A CFG is defined by a 4-tuple
is a finite set of nonterminal symbols. is a finite set of terminal symbols. is a finite set of production rules, each of the form , where and . is the start symbol, which is a nonterminal symbol.
Sentential Forms
A sentential form is a string of terminal and nonterminal symbols. The start symbol
A context-free language is the set of all strings that can be generated by a CFG. As we mentioned earlier, not all languages can be described by a CFG. There is a intuitive idea: context-free languages can count only two things at a time, not three.
BNF Notation
Grammars for programming languages are often written in BNF. In BNF, we write variables in angle brackets, and use ::=
to denote a production rule. Terminals are often multi-character strings indicated by boldface or underlining.
Kleene Star
The Kleene star in BNF can be translated to a CFG by using a new nonterminal symbol. For example, the BNF rule:
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<integer> ::= <digit> | <digit> <integer>
2
can be translated to the CFG:
Optional Elements
Optional elements in BNF can be translated to a CFG by using a new nonterminal symbol. For example, the BNF rule:
<if-statement> ::= if <expression> then <statement> [else <statement>]
can be translated to the CFG:
Leftmost and Rightmost Derivation
A leftmost derivation is a sequence of sentential forms where the leftmost nonterminal symbol is always the one that is replaced, which is denoted by
A rightmost derivation is similar, but the rightmost nonterminal symbol is always the one that is replaced, which is denoted by