A Companion to Chomsky. Группа авторов

A Companion to Chomsky

upper Q EndSet"/>, and will therefore be treated as intersubstitutable.

These examples bring out an important intuition that is likely familiar to linguists: designing a grammar amounts to choosing “what to remember” about intermediate subexpressions, and what can be ignored. It is exactly this move of ignoring distinctions between subexpressions that allows a finitely specified grammar to generate unboundedly many expressions. A device that never allowed itself to “forget,” or ignore, some aspects of an expression's internals, would be one where the applicability of a grammatical rule is dependent on the entire surrounding context, and would simply amount to a finite list of expressions. The automaton in Figure 5.6 is of this uninteresting sort: each state serves to remember an entire prefix, so no two distinct prefixes are “grouped together” by virtue of sharing a forward set, and the automaton simply encodes an arbitrary finite set of strings (namely StartSet sans-serif-italic b comma sans-serif-italic a b comma sans-serif-italic a a a comma sans-serif-italic a a b comma sans-serif-italic a b a comma sans-serif-italic b b a comma sans-serif-italic b b b EndSet ). This set exhibits no interesting patterns because the automaton has no interesting structure.

Table 5.4 Equivalence classes induced by the grammar in Figure 5.5

	Second‐last symbol is not a	Second‐last symbol is a
Last symbol is not a	FORWARD SET:	FORWARD SET:
	EXAMPLE STRINGS: , b, bb, abb	EXAMPLE STRINGS: ab, aab, bab
Last symbol is a	FORWARD SET:	FORWARD SET:
	EXAMPLE STRINGS: a, ab, aba, bba	EXAMPLE STRINGS: aa, aaa, baa

Figure 5.6 A boring finite state automaton where states stand in a one‐to‐one correspondence with prefixes

Given the central role of this partitioning of strings into classes, a natural question that arises is what sorts of partitionings are possible. How large can the number of classes induced by a finite‐state grammar get? There is no upper limit, but the number of classes must be finite.¹⁴ This follows from the fact that the number of states is finite: each equivalence class is identified by a subset of the set of states, and if the set of states is finite then it has only finitely many subsets.

The requirement that there only be finitely many equivalence classes allows us to succinctly identify the limitations of FSGs. Consider, for example, trying to construct an FSG for sans-serif-italic a Superscript n Baseline sans-serif-italic b Superscript n (for n greater-than-or-equal-to 1 ). First note that the prefix a is not intersubstitutable with aa, since one but not the other will lead to a well‐formed string when followed by bb. Similarly, neither of these prefixes is intersubstitutable with aaa. So the three strings a, aa, and aaa need to be split up into three distinct equivalence classes. In fact, as we go further down the list to aaaa, aaaaa, aaaaaa and so on, we will never find two such strings of as that are intersubstitutable. No finite number of equivalence classes will be enough to track all the relevant distinctions. But for any finite number of states, there are only finitely many different possible forward sets; any FSG will, eventually, assign two different strings of as – say, sans-serif-italic a Superscript 73 and – the same forward set, and will therefore incorrectly treat them as intersubstitutable prefixes (e.g. incorrectly generating and in addition to and ).

There is something natural about this idea: a grammar (or a mind) can only contain finitely many rules, and each rule specifies some allowable “use” for finitely many classes of expression, picked out by states or nonterminal symbols. Even if we somehow sliced up the space of all possible expressions into infinitely many equivalence classes, any finite grammar would not contain enough rules to define uses for all of those classes.

5.4 Type 2 Grammars: Context‐Free Grammars

A Type 2 or context‐free grammar (CFG) allows the right‐hand side of a rule to be any mixture of nonterminal and terminal symbols. A classic example, which generates the sans-serif-italic a Superscript n Baseline sans-serif-italic b Superscript n (for n greater-than-or-equal-to 1 ) stringset, is shown in (2). A derivation using this grammar is in (3).

1 (2) start symbol: SSaSbSab

2 (3)SaSbaaSbbaaaSbbbaaaaSbbbbaaaaabbbbb

It is helpful to consider exactly how this CFG manages to do what no FSG can. What made this pattern impossible for an FSG is that the endless collection of prefixes a, aa, aaa, etc. cannot all be “kept separate” by a function that maps each prefix to a subset of a finite number of states. But a CFG still has only a finite number of rules, which specify the behaviour of a finite number of nonterminal symbols; one way or another, a CFG must boil down to a collection of rules that specify finitely many ways in which subexpressions of finitely many different classes can be combined. How can this finiteness be reconciled with the fact that a CFG can generate the sans-serif-italic a Superscript n Baseline sans-serif-italic b Superscript n pattern?

The

Скачать книгу