Data Science in Theory and Practice. Maria Cristina Mariani

Data Science in Theory and Practice - Maria Cristina Mariani


Скачать книгу
bold upper X 1 given bold upper X 2 equals bold a, for some vector bold a is

bold upper X 1 vertical-bar bold upper X 2 equals bold a tilde upper M upper V upper N Subscript r Baseline left-parenthesis mu 1 minus sigma-summation Underscript 12 Endscripts sigma-summation Underscript 22 Overscript negative 1 Endscripts left-parenthesis mu 2 minus bold a right-parenthesis comma sigma-summation Underscript 11 Endscripts minus sigma-summation Underscript 12 Endscripts sigma-summation Underscript 22 Overscript negative 1 Endscripts sigma-summation Underscript 21 Endscripts right-parenthesis period

      Furthermore, the vectors bold upper X 2 and bold upper X 1 minus sigma-summation Underscript 21 Endscripts sigma-summation Underscript 22 Overscript negative 1 Endscripts bold upper X 2 are independent. Finally, any affine transformation upper A upper X plus b, where upper A is a k times k matrix and b is a k‐dimensional constant vector, is also a multivariate normal with mean vector upper A mu plus b and covariance matrix upper A sigma-summation upper A Superscript upper T. Please refer to the text by Axler (2015) and Johnson and Wichern (2014) for more details on the Multinomial distribution and Multivariate normal distributions.

      1 If and are two matrices, prove the following properties of the trace of a matrix..., for a any constant .

      2 If and are two matrices, prove the following properties of the determinant of a matrix.det = det .det = det det = det .

      3 LetFind .Find .Find .Find .

      4 LetFind .Find .Compare and .

      5 LetFind .Find .

      6 Show that the real symmetric matrixis positive definite for any non‐zero column vector.

      7 Prove that if and are positive definite matrices then so is .

      8 For what values of is the following matrix positive semidefinite?

      9 Decide whether the following matrices are positive definite, negative definite, or neither. Please explain your reasoning.

      10 For random variables and , show thatThe variance is the variance of the random variable , while the same holds for the random variable .

      3.1 Introduction

      Multivariate analysis is the statistical analysis of several variables at once. This is when multiple measurements are made on each experimental unit, and for which the relationship among multivariate measurements and their structure are important to the experiment's understanding. Experimental units are what you apply the treatments to. Many problems in the analysis of life science are multivariate in nature. However the analysis of large multivariable data sets is a major challenge for many research fields. Applications of multivariate techniques are vast. Some includes behavioral and biological sciences, finance, geophysics, medicine, ecology, and many other fields. The materials in this chapter will form the basis of discussion for what will be discussed later in this text.

      We begin with the formal definition of multivariate analysis.

      Definition 3.1 (Multivariate analysis) Multivariate analysis consists of a collection of techniques that can be used when several measurements are made on each experimental unit.

      These measurements (i.e. data) must frequently be arranged and displayed in various ways. We now discuss the concepts underlying the first steps of data organization.

      Multivariate data arise whenever an investigator, practitioner, or researcher seeks to study some physical phenomenon and selects a number p greater-than-or-equal-to 1 of variables to record. We will use the notation x Subscript j k to indicate the particular value of the kth variable that is observed on the jth unit (i.e. subject ). Hence, n measurements on p variables can be displayed as a rectangular array called data matrix bold upper X, of n rows and p columns:

bold upper X equals Start 6 By 6 Matrix 1st Row 1st Column x Subscript 1 comma 1 Baseline 2nd Column x Subscript 1 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column x Subscript 1 comma k Baseline 5th Column midline-horizontal-ellipsis 6th Column x Subscript 1 comma p Baseline 2nd Row 1st Column x Subscript 2 comma 1 Baseline 2nd Column x Subscript 2 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column x Subscript 2 comma k Baseline 5th Column midline-horizontal-ellipsis 6th Column x Subscript 2 comma p Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column Blank 4th Column vertical-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 4th Row 1st Column x Subscript j comma 1 Baseline 2nd Column x Subscript j comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column x Subscript j comma k Baseline 5th Column midline-horizontal-ellipsis 6th Column x Subscript j comma p Baseline 5th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column Blank 4th Column vertical-ellipsis 5th Column vertical-ellipsis 6th Column vertical-ellipsis 6th Row 1st Column x Subscript n comma 1 Baseline 2nd Column x Subscript n comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column x Subscript n comma k Baseline 5th Column midline-horizontal-ellipsis 6th Column x Subscript n comma p Baseline EndMatrix period

      Example 3.1 (A data array) A selection of three receipts from Bestbuy was obtained in order to investigate the nature of movie sales. Each receipt provided, among other things, the number of movies sold and the total amount of each sale. Let the first variable be total dollar sales and the second variable be number of movies sold. Then we can take the corresponding numbers on the receipts as three measurements on two variables. From the above description, we obtain the tabular form of the data as follows:

StartLayout 1st <hr><noindex><a href=Скачать книгу