Data Science in Theory and Practice. Maria Cristina Mariani

Data Science in Theory and Practice - Maria Cristina Mariani


Скачать книгу
X Baseline right-parenthesis left-parenthesis bold upper X minus mu Subscript bold upper X Baseline right-parenthesis Superscript t Baseline left-parenthesis u Superscript upper T Baseline right-parenthesis Superscript t Baseline right-bracket 4th Row 1st Column Blank 2nd Column equals u Superscript upper T Baseline Cov left-parenthesis bold upper X right-parenthesis u period EndLayout"/>

      Since the variance is always nonnegative, the covariance matrix must be nonnegative definite (or positive semidefinite). We recall that a square symmetric matrix upper A element-of double-struck upper R Superscript n times n is positive semidefinite if u Superscript t Baseline upper A u greater-than-or-equal-to 0 comma for-all u element-of double-struck upper R Superscript n. This difference is in fact important in the context of random variables since you may be able to construct a linear combination u Superscript upper T Baseline bold upper X which is not always constant but whose variance is equal to zero.

      The covariance matrix is discussed in detail in Chapter 3.

      We now present examples of multivariate distributions.

      2.3.1 The Dirichlet Distribution

      Before we discuss the Dirichlet distribution, we define the Beta distribution.

      Definition 2.22 (Beta distribution) A random variable upper X is said to have a Beta distribution with parameters alpha and beta if it has a pdf f left-parenthesis x right-parenthesis defined as:

f left-parenthesis x right-parenthesis equals Start 2 By 2 Matrix 1st Row 1st Column StartFraction normal upper Gamma left-parenthesis alpha plus beta right-parenthesis Over normal upper Gamma left-parenthesis alpha right-parenthesis normal upper Gamma left-parenthesis beta right-parenthesis EndFraction x Superscript alpha minus 1 Baseline left-parenthesis 1 minus x right-parenthesis Superscript beta minus 1 Baseline comma 2nd Column if 0 less-than x less-than 1 comma 2nd Row 1st Column 0 comma 2nd Column if otherwise comma EndMatrix

      where alpha greater-than 0 and beta greater-than 0.

      The Dirichlet distribution Dir left-parenthesis bold-italic alpha right-parenthesis, named after Johann Peter Gustav Lejeune Dirichlet (1805–1859), is a multivariate distribution parameterized by a vector bold alpha of positive parameters left-parenthesis alpha 1 comma ellipsis comma alpha Subscript n Baseline right-parenthesis.

      Specifically, the joint density of an n‐dimensional random vector bold upper X tilde Dir left-parenthesis bold-italic alpha right-parenthesis is defined as:

f left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis equals StartFraction 1 Over bold upper B left-parenthesis bold-italic alpha right-parenthesis EndFraction left-parenthesis product Underscript i equals 1 Overscript n Endscripts x Subscript i Superscript alpha Super Subscript i Superscript minus 1 Baseline bold 1 Subscript left-brace x Sub Subscript i Subscript greater-than 0 right-brace Baseline right-parenthesis bold 1 Subscript left-brace x 1 plus midline-horizontal-ellipsis plus x Sub Subscript n Subscript equals 1 right-brace Baseline comma

      where 1 Subscript left-brace x 1 plus midline-horizontal-ellipsis plus x Sub Subscript n Subscript equals 1 right-brace is an indicator function.

1 Subscript upper A Baseline colon upper X right-arrow StartSet 0 comma 1 EndSet

      defined as

1 Subscript upper A Baseline left-parenthesis x right-parenthesis equals Start 2 By 2 Matrix 1st Row 1st Column 1 comma 2nd Column if x element-of upper A comma 2nd Row 1st Column 0 comma 2nd Column if x not-an-element-of upper A period EndMatrix

      The components of the random vector bold upper X thus are always positive and have the property upper X 1 plus midline-horizontal-ellipsis plus upper X Subscript n Baseline equals 1. The normalizing constant bold upper B left-parenthesis bold-italic alpha right-parenthesis is the multinomial beta function, that is defined as:

bold upper B left-parenthesis bold-italic alpha right-parenthesis equals StartFraction product Underscript i equals 1 Overscript n Endscripts normal upper Gamma left-parenthesis alpha Subscript i Baseline right-parenthesis Over normal upper Gamma left-parenthesis sigma-summation Underscript i equals 1 Overscript n Endscripts alpha Subscript i Baseline right-parenthesis EndFraction equals StartFraction product Underscript i equals 1 Overscript n Endscripts normal upper Gamma left-parenthesis alpha Subscript i Baseline right-parenthesis Over normal upper Gamma left-parenthesis alpha 0 right-parenthesis EndFraction comma

      where we used the notation alpha 0 equals sigma-summation Underscript i equals 1 Overscript n Endscripts alpha Subscript i and normal upper Gamma left-parenthesis x right-parenthesis equals integral Subscript 0 Superscript infinity Baseline t Superscript x minus 1 Baseline e Superscript negative t Baseline d t for the Gamma function.

      Because the Dirichlet distribution creates n positive numbers that always sum to 1, it is extremely useful to create candidates for probabilities of n possible outcomes. This distribution is very popular and related to the multinomial distribution which needs n numbers summing to 1 to model the probabilities in the distribution. The multinomial distribution is defined in Section 2.3.2.

      With the notation


Скачать книгу