A Companion to Chomsky. Группа авторов
upper Q EndSet"/>, and will therefore be treated as intersubstitutable.
These examples bring out an important intuition that is likely familiar to linguists: designing a grammar amounts to choosing “what to remember” about intermediate subexpressions, and what can be ignored. It is exactly this move of ignoring distinctions between subexpressions that allows a finitely specified grammar to generate unboundedly many expressions. A device that never allowed itself to “forget,” or ignore, some aspects of an expression's internals, would be one where the applicability of a grammatical rule is dependent on the entire surrounding context, and would simply amount to a finite list of expressions. The automaton in Figure 5.6 is of this uninteresting sort: each state serves to remember an entire prefix, so no two distinct prefixes are “grouped together” by virtue of sharing a forward set, and the automaton simply encodes an arbitrary finite set of strings (namely
Table 5.4 Equivalence classes induced by the grammar in Figure 5.5
Second‐last symbol is not a | Second‐last symbol is a | |
---|---|---|
Last symbol is not a |
FORWARD SET: |
FORWARD SET: |
EXAMPLE STRINGS: |
EXAMPLE STRINGS: ab, aab, bab | |
Last symbol is a |
FORWARD SET: |
FORWARD SET: |
EXAMPLE STRINGS: a, ab, aba, bba | EXAMPLE STRINGS: aa, aaa, baa |
Figure 5.6 A boring finite state automaton where states stand in a one‐to‐one correspondence with prefixes
Given the central role of this partitioning of strings into classes, a natural question that arises is what sorts of partitionings are possible. How large can the number of classes induced by a finite‐state grammar get? There is no upper limit, but the number of classes must be finite.14 This follows from the fact that the number of states is finite: each equivalence class is identified by a subset of the set of states, and if the set of states is finite then it has only finitely many subsets.
The requirement that there only be finitely many equivalence classes allows us to succinctly identify the limitations of FSGs. Consider, for example, trying to construct an FSG for
There is something natural about this idea: a grammar (or a mind) can only contain finitely many rules, and each rule specifies some allowable “use” for finitely many classes of expression, picked out by states or nonterminal symbols. Even if we somehow sliced up the space of all possible expressions into infinitely many equivalence classes, any finite grammar would not contain enough rules to define uses for all of those classes.
5.4 Type 2 Grammars: Context‐Free Grammars
A Type 2 or context‐free grammar (CFG) allows the right‐hand side of a rule to be any mixture of nonterminal and terminal symbols. A classic example, which generates the
1 (2) start symbol: SSaSbSab
2 (3)SaSbaaSbbaaaSbbbaaaaSbbbbaaaaabbbbb
It is helpful to consider exactly how this CFG manages to do what no FSG can. What made this pattern impossible for an FSG is that the endless collection of prefixes a, aa, aaa, etc. cannot all be “kept separate” by a function that maps each prefix to a subset of a finite number of states. But a CFG still has only a finite number of rules, which specify the behaviour of a finite number of nonterminal symbols; one way or another, a CFG must boil down to a collection of rules that specify finitely many ways in which subexpressions of finitely many different classes can be combined. How can this finiteness be reconciled with the fact that a CFG can generate the
The