Statistical Relational Artificial Intelligence. Luc De Raedt
Figure 3.2: Grounding of the grades model for 3 people (Sam, Chris, and Kim) and 2 courses (c1 and c2).
The basic principle used by these models is that of parameter sharing: the instances of the parameterized factors created by substituting constants for logical variables share the same probabilistic parameters. This is a consequence of exchangeability: the variables can denote any individual, and so need to be the same for each instance (before we have observed anything to distinguish the individuals). The various languages differ in how to specify the conditional probabilities of the parameterized random variables given their parents, or the other parameters of the probabilistic model.
Often in plate models [Buntine, 1994, Jordan, 2010] the numerical parameters are made explicit, to emphasize that the probabilistic parameters do not depend on the individuals.
Figure 3.3: Plate representation of the grades model, with shared parameters explicit.
Example 3.4 Figure 3.3 shows the plate model of Fig. 3.1 with the numerical parameters made explicit. Parameter θi specifies the prior probability of int(S) that is shared among the students, parameter θd specifies the prior probability of diff(C), and θg specifies the numerical parameters of the conditional probability P(grade(S, C) | int(S), diff(C)). The figure makes explicit that these parameters do not depend on the individuals.
There are some issues that relational models must face, beyond the non-relational case. We highlight one such issue now.
Suppose there is a directed model where for some parameterized random variable X, the parents of X include a logical variable that does not appear in X. In the grounding, X has an unbounded number of parents, and we need some way to represent the conditional probability of a variable given an unbounded number of parents; such methods are called aggregation functions. While aggregation does occur in the non-relational case, it cannot be avoided in the directed relational case.
Example 3.5 Consider the problem of determining guilt in a shooting. Figure 3.4 depicts an example where someone shot person Y if there exists a person X who shot Y. In this model, someone_shot(Y) has the number of parents equal to the population size. In this case, the appropriate aggregator is a logical “or.” The PRV someone_shot(Y) has the meaning ∃X shot(X, Y), and it may be appropriate to write it in this way. Whether X shot Y depends on whether X had motive, opportunity and a gun.
The instances of shot(X, Y) for each X are dependent because they share a common ancestor, has_gun(X). The instances of shot(X, Y) are dependent for some Y if someone_shot(Y) is observed.
Figure 3.4: Plate representation of determining guilt from Example 3.5.
Often there is much more structure about how the instances interact than can be represented in a model that assumes that the instances are only dependent given shared ancestors or observed shared descendants, as in the following two examples.
Example 3.6 Consider the case of diagnosing students’ performance in adding multi-digit numbers of the form
A student, given values for the digits xi and yi, provides values for the zi digits. The aim is to determine whether the students have mastered addition from observations of their performance.
Whether a student gets the correct carry depends on the previous x, y and carry, and whether the student knows how to carry. This dependency can be seen in Fig. 3.5. Here x(D, P) is a parameterized random variable representing digit D of the first summand for problem P. There is a random variable in the grounding for each digit D and each problem P. A ground instance, such as x(3, problem57), is a random variable that may represent the third digit from the right of the topmost number of problem 57. Similarly, there is a z-variable for each digit D, problem P, student S, and time T. The plate notation can be read as duplicating the random variable for each tuple of individual the plate is parameterized by. Whether the student knows how to carry or knows how to add depends on the student and the time, but not on the digit or the problem.
For each problem, student and time, the carry for digit D, namely c(D, P, S, T), depends, in part, on c(D − 1, P, S, T), that is, on the carry from the previous digit. There is a special case for the first digit. Similarly, whether a student knows how to add or carry at any time depends on whether they knew how to add or carry at the previous time. This is depicted by cycles in the plate notation, but it results in an acyclic grounding. The plate notation does not convey all of the information about the dependencies.
Figure 3.5: Belief network with plates for multidigit addition.
Example 3.7 Consider determining the probability that two authors are collaborators, which may depend on whether they have written papers in common, or even whether they have written papers apart from each other. That is, collaborates(A1, A2) may depend on the existence of, and properties of any paper P such that author(A1, P) ∧ author(A2, P) is true. A representation has to be able to specify what happens when there is more than one paper that they are co-authors of. The existential quantification used in Example 3.5 does not seem to be appropriate here. It may also depend on other co-authors who may have collaborated with each of them. Any such rules are only applicable if A1 ≠ A2, that is if A1 and A2 are different people. Examples such as these require more sophisticated languages than the plate models specified above.
The first such languages either had explicit languages for constructing the ground network [Breese, 1992, Goldman and Charniak, 1990] or defined templates for prior and conditional probabilities [Horsch and Poole, 1990] directly in terms of tables, and required a combination function (such as noisy-and or noisy-or) when the number of parents depends on the number of individuals. These template models turn out not to be very flexible as they cannot represent the subtleties involved in how one random variable can depend on others.
To represent such examples, it is useful to be able to specify how the logical variables interact, as is done in logic programs. The independent choice logic (ICL) [Poole, 1997, 2008] (originally called probabilistic Horn abduction (PHA) [Poole, 1991, 1993b]) allows for arbitrary (acyclic) logic programs (including negation as failure) to be used to represent the dependency. The conditional probability parameters are represented as independent probabilistic inputs to the logic program. A logic program that represents Example 3.6 is given in Chapter 14 of [Poole and Mackworth, 2010]. This mix of probabilities and logic programs also forms the foundations for PRISM [Sato and Kameya, 1997, 2008], which has concentrated on learning, and for Problog [De Raedt et al., 2007], a project to build an efficient and flexible language.
Other languages are based on entity-relationship