Statistical Relational Artificial Intelligence. Luc De Raedt
2.2.
A Bayesian network can be seen as a Markov random field, where each conditional probability is a factor. Thus any inference algorithm developed for Markov networks can be applied to Bayesian networks. Many of the inference algorithms are designed for Markov networks for this reason. However, Bayesian networks allow for specialized algorithms, such as recursively pruning variables that are not observed, not queried, and have no children. Any Markov network can also be represented as a Bayesian network by scaling each factor to be in the range [0, 1] (by multiplying by a constant), creating a new variable tk for each factor fk and defining P(tk = true | xk) = fk(xk), and conditioning on tk = true.
An alternative way to depict a Markov random field is in terms of a factor graph. A factor graph is a bipartite graph, which contains a variable node for each random variable xi and a factor node for each factor fi. There is an edge between a variable node and a factor node if and only if the variable appears in the factor. This is often the preferred representation to describe probabilistic inference methods sending messages between variables and factors. The factor graph for the alarm Bayesian network is shown in Fig. 2.3.
Figure 2.2: The Markov network for the burglary alarm Network.
Figure 2.3: The factor graph representing the burglary alarm Bayesian network.
A Markov network can be obtained from a factor graph, by adding arcs between all of the neighbors of each factor node, and dropping the factor nodes. A factor graph can be obtained from a Markov network by creating a factor for each clique in the Markov network. However, mapping to a Markov network loses information. For example, the Markov random field with a single factor f0(X, Y, Z) has the same Markov network as the Markov random field with factors {f1(X, Y), f2(Y, Z), f3(X, Z)}, but they have different factor graphs. The network with a single factor can represent more distributions than the one with only pairwise factors, e.g., the distribution where the variables are independent of each other except that they have even parity.
Figure 2.4: A logic program defining grandparent.
Figure 2.5: A logic program defining nat.
2.2 FIRST-ORDER LOGIC AND LOGIC PROGRAMMING
The probabilistic models introduced in the previous section have a fixed finite number of random variables. Boolean random variables correspond to propositions in logic, and propositions can also represent more complex random variables. First-order logic allows for an unbounded number of propositions by including logical variables, constants, and functions. The same ideas can be used for random variables. In order to understand this we now introduce first-order logic, and the directed variant, which are logic programs.
To introduce logic and logic programs, consider Figs. 2.4 and 2.5, which contain two programs, grandparent and nat. In these programs grandparent/2, parent/2, and nat/1 are predicates (with their arity, the number of arguments, listed explicitly). The strings jef, paul, ann, and 0 are constants, whereas X, Y, and Z are logical variables, which we will just call variables when there is no confusion with random variables. All constants and variables are also terms. In addition, there exist structured terms such as s(X), which contains the functor s/1 of arity 1 and the term X. Constants are often considered as functors of arity 0. A first-order alphabet ∑ is a set of predicate symbols, constant symbols and functor symbols.
Atoms are predicate symbols followed by the necessary number of terms, e.g., parent(jef, paul), nat(s(X)), parent(X, Z), etc. A literal is an atom, e.g., nat(s(X)), (called a positive literal) or the negation of an atom, e.g., ¬nat(s(X)) (called a negative literal).
First-order logical formulas are recursively constructed from atoms using logical connectives (¬, ⋁ and ⋀) and quantifiers (∀ and ∃). If f1 and f2 are formulas then the following are formulas, too:
• ¬f1 (negation), which is true iff f1 is not true;
• f1 ⋀ f2 (conjunction), which is true iff both f1 and f2 are true;
• f1 ⋁ f2 (disjunction), which is true iff either f1 or f2 is true;
• f1 → f2, or f2 ← f1 (implication), which is true if f1 is false or f2 is true;
• f1 ↔ f2 (equivalence), which is shorthand for (f1 → f2) ∧ (f2 → f1);
• ∀Xf1 (universal quantification), which is true if f1 is true for every individual that X could refer to; and
• ∃Xf1 (existential quantification), which is true if f1 is true for at least one individual.
An important subset of first-order logic is that of clausal logic. A clausal theory consists of a set of clauses. A clause is a formula of the form
where the Aj and the Bi are atoms. All variables in clauses are understood to be universally quantified. Clause (2.3) is logically equivalent to
Example 2.2 Two example clauses are:
The first clause states that for all X if X is human, X is either male or female. The second clause states that for all X, X cannot be both male and female, that is, no-one is both male and female. Here, false is an atom that is always false or is the empty disjunction n = 0.
A practical subset of clausal logic is that of logic programs, which are a subset that contains no uncertainty; what is given up is the ability to state that a ∨ b is true, without saying which one of a or b is true. Logic programs are of interest because they have an intuitive declarative and procedural interpretation [Kowalski, 2014].
A definite clause is a clause with exactly one atom in the conclusion part of the clauses (that is, n = 1). Following the Prolog tradition, definite clauses are usually written with the conjunction represented by a comma, and “if” represented by “: -”