Like forward sets, the inside set of a string (relative to a particular CFG) can be computed recursively from the inside sets of the string's parts. Recall that the forward set of a string , for example, can be computed by considering the states in the forward set of , and asking which of these states have an outgoing transition emitting . The situation for inside sets is similar. Considering only rules of the form “” for ease of exposition: to compute the inside set of , we must consider the inside set of and ask whether any of its member nonterminals can be combined with any nonterminal in the inside set of ; but also consider combinations drawing one nonterminal from the inside set of and one from that of , plus combinations drawing one from the inside set of and one from that of . (The inside set of a length‐one string , we can assume, is just the set of nonterminals for which the grammar contains the rule .)15 The upshot is that to work out the inside set of , it suffices to know the inside sets of all its substrings. This is more complex than the situation for forward sets because there are multiple “splits” to consider (as opposed to only splitting into and ); but it is analogous in the important sense that inside sets tell us everything there is to know about how the grammar treats the relevant subexpression.
This freedom to split strings into pieces in arbitrary ways underlies a CFG's ability to create classes of intersubstitutable infixes. For example, consider the first tree in (5): the fact that we can split the last word off dogs dogs chase sleep but then split the first word off dogs dogs chase leads to the end result that the overall string is split into an infix and its surroundings . The nonterminal symbol RC serves as a hinge or pivot that links together these two pieces – dogs chase, which can appear “inside” RC, and , which can appear “outside” it – just as the state B in Figure 5.2, for example, links together someone ran and and ran really quickly.
In an important sense, what distinguishes CFGs from FSGs is not directly an issue of hierarchical tree structure or constituency – the tree in Figure 5.3 is hierarchical, expressing part‐whole relationships among constituents, in at least one sense – but rather, the distinction between being able to make infix‐circumfix splits and being restricted to prefix‐suffix splits. Specifically, the stringsets that can be generated by a CFG but not by any FSG are those for which every CFG is “self‐embedding” (Chomsky, 1959, p. 167), i.e. those where some string can be substituted for a proper infix of itself, such as the way aabb can be substituted for ab in the stringset. The relevant equivalence classes are still classes of strings, and are based on the way substrings fit into larger strings.
What sorts of stringsets cannot be generated by the infix‐based mechanisms of a CFG? The comparison between the stringsets and defined in (6) is instructive (Chomsky and Miller, 1963, pp. 285–287). Some notation: is the reverse of a string , and is the set of all non‐empty strings using the symbols a and b. So is the set of even‐length palindromes, and is the set of strings whose second half simply repeats the first half.
1 (6)
There is a CFG that generates , shown in (7).
1 (7)S aSaS bSbS aaS bb
It may seem at first surprising, then, that no CFG can generate , since it is simply without the