A Companion to Chomsky. Группа авторов

A Companion to Chomsky - Группа авторов


Скачать книгу
StartSet normal upper A comma normal upper V EndSet chased StartSet NP comma VP EndSet chased dogs empty-set dogs dogs sleep chased big dogs chased

      This freedom to split strings into pieces in arbitrary ways underlies a CFG's ability to create classes of intersubstitutable infixes. For example, consider the first tree in (5): the fact that we can split the last word off dogs dogs chase sleep but then split the first word off dogs dogs chase leads to the end result that the overall string is split into an infix sans-serif-italic d o g s c h a s e and its surroundings sans-serif-italic d o g s bar bar sans-serif-italic s l e e p. The nonterminal symbol RC serves as a hinge or pivot that links together these two pieces – dogs chase, which can appear “inside” RC, and sans-serif-italic d o g s bar bar sans-serif-italic s l e e p, which can appear “outside” it – just as the state B in Figure 5.2, for example, links together someone ran and and ran really quickly.

      In an important sense, what distinguishes CFGs from FSGs is not directly an issue of hierarchical tree structure or constituency – the tree in Figure 5.3 is hierarchical, expressing part‐whole relationships among constituents, in at least one sense – but rather, the distinction between being able to make infix‐circumfix splits and being restricted to prefix‐suffix splits. Specifically, the stringsets that can be generated by a CFG but not by any FSG are those for which every CFG is “self‐embedding” (Chomsky, 1959, p. 167), i.e. those where some string can be substituted for a proper infix of itself, such as the way aabb can be substituted for ab in the sans-serif-italic a Superscript n Baseline sans-serif-italic b Superscript n stringset. The relevant equivalence classes are still classes of strings, and are based on the way substrings fit into larger strings.

      What sorts of stringsets cannot be generated by the infix‐based mechanisms of a CFG? The comparison between the stringsets upper L Subscript pal and upper L Subscript repeat defined in (6) is instructive (Chomsky and Miller, 1963, pp. 285–287). Some notation: w Superscript upper R is the reverse of a string w, and StartSet sans-serif-italic a comma sans-serif-italic b EndSet Superscript plus is the set of all non‐empty strings using the symbols a and b. So upper L Subscript pal is the set of even‐length palindromes, and upper L Subscript repeat is the set of strings whose second half simply repeats the first half.

      1 (6)

      There is a CFG that generates upper L Subscript pal, shown in (7).

      1 (7)S aSaS bSbS aaS bb

      It may seem at first surprising, then, that no CFG can generate upper L Subscript repeat, since it is simply upper L Subscript pal without the


Скачать книгу