A Companion to Chomsky. Группа авторов

A Companion to Chomsky

an immediate constituent analysis for the generated string. For example, there is no natural way to assign such a tree structure to the string aaaa on the basis of the derivation shown in Figure 5.1. This is roughly because the grammar contains rules that do not simply replace a single symbol with a string (Chomsky and Miller, 1963, p. 294).

Another computational device that had attracted much attention in the 1950s was the finite‐state automaton (FSA) (e.g. Rabin and Scott, 1959). Chomsky (1956) investigated the generative capacity of FSAs and concluded that it was insufficient for natural language, and showed that a certain sort of phrase structure grammar formalizing the notion of immediate constituent analysis could handle at least some cases that FSAs could not.¹¹

The classes of rewriting grammars investigated in Chomsky 1959 are therefore motivated by the interest in “devices with more generative capacity than finite automata but that are more structured (and, presumably, have less generative capacity) than arbitrary Turing machines” (Chomsky, 1963, p. 360), the idea being that “the intermediate systems are those that assign a phrase structure description to the resulting sentence” (Chomsky, 1959, p. 139). A sequence of three increasingly strict restrictions on rewriting grammars (Chomsky, 1959, p. 142) produces a hierarchy of classes of grammars, the broadest of which corresponds to Turing machines and the narrowest of which corresponds to finite‐state automata.

These three restrictions are shown in Table 5.1. In this table, upper A stands for any nonterminal symbol, and phi and psi stand for any (possibly empty) strings of terminal and nonterminal symbols. Notice that each of these restrictions requires that a rule replaces a single symbol with some non‐empty string, in order to allow the possibility of constructing a tree expressing immediate‐constituent structure from any derivation.¹²

Table 5.1 Three increasing restrictions on rewriting grammars

Restriction	Required form of rules
Restriction 1, “context‐sensitive”		where	is a non‐empty string
Restriction 2, “context‐free”		where	is a non‐empty string
Restriction 3, “finite‐state”	or	where	is a terminal symbol
		and	is a nonterminal symbol

To be precise, these are restrictions on the form that individual rules can take, and a grammar is of a certain type if and only if all of its rules satisfy the corresponding restriction. But notice that rules that satisfy Restriction 2 necessarily also satisfy Restriction 1 (by choosing phi and psi to be the empty string), and similarly rules that satisfy Restriction 3 also satisfy both of the others. So these restrictions on the form of rules give us four classes of grammars:

1 The Type 0 grammars are simply all unrestricted rewriting grammars.The Type 1 grammars are those satisfying Restriction 1.The Type 2 grammars are those satisfying Restriction 2 (and therefore also Restriction 1).The Type 3 grammars are those satisfying Restriction 3 (and therefore also Restrictions 1 and 2).

The grammar in Figure 5.1 is a Type 0 grammar and does not belong to any of the more restricted classes, since rules such as “a B right-arrow B a” do not satisfy Restriction 1. It turns out, however, that there is a Type 1 grammar that generates this same powers‐of‐two stringset.¹³ This raises the question of how these classes of grammars relate to generability of stringsets: are there stringsets that can be generated by a Type 0 grammar but not by any Type 1 grammar? Or is the extra rule‐writing freedom that comes with Type 0 grammars actually inconsequential from the perspective of generable stringsets?

It turns out that all four classes of grammars are distinct in their generative capacity: at each boundary, there are stringsets that can be generated by a Type grammar that cannot be generated by any Type left-parenthesis n plus 1 right-parenthesis grammar. So there are, for example, Type 0 grammars that generate stringsets that no Type 1 grammar can generate – even if the Type 0 grammar in Figure 5.1 is not one of them.

5.3 Type 3 Grammars: Finite State Grammars

As mentioned above, Type 3 grammars were understood from the outset to be a reformulation of finite‐state automata in the setting of rewriting grammars, so I will generally describe them as “finite‐state grammars” (FSGs).

Figure 5.2 shows an example of a FSG in rewrite‐rule format, and also the equivalent finite‐state automaton in a standard graphical representation. A finite‐state automaton works with a specified finite set of states (indicated with circles in the diagram) and a specified set of (terminal) symbols. One of the states is the designated start state (indicated with an unlabeled arrow), and some of the states are designated accepting states (indicated with a double circle). The workings of the automaton are specified as a set of transitions, each associating an ordered pair of states with a symbol, and represented graphically by an arrow.

A finite‐state automaton generates a string if and only if there is a connected sequence of transitions leading from the start state to an accepting state,

Скачать книгу