Data Science in Theory and Practice. Maria Cristina Mariani

Data Science in Theory and Practice - Maria Cristina Mariani


Скачать книгу
Subscript 1 comma 1 Baseline 2nd Column sigma Subscript 1 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript 1 comma p Baseline 2nd Row 1st Column sigma Subscript 2 comma 1 Baseline 2nd Column sigma Subscript 2 comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript 2 comma p Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column Blank 4th Column vertical-ellipsis 4th Row 1st Column sigma Subscript p comma 1 Baseline 2nd Column sigma Subscript p comma 2 Baseline 3rd Column midline-horizontal-ellipsis 4th Column sigma Subscript p comma p Baseline EndMatrix period"/>

      The notation sigma-summation for the covariance matrix is widely used and seems natural because sigma-summation is the uppercase version of sigma.

      Example 3.3 Consider the following data matrix introduced in Example 3.1:

bold upper X equals Start 3 By 2 Matrix 1st Row 1st Column 48 2nd Column 3 2nd Row 1st Column 22 2nd Column 1 3rd Row 1st Column 50 2nd Column 2 EndMatrix period

      Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. Since there are three receipts, we have a total of three observations on each variable. We find the sample variances and covariance bold upper S Subscript n as follows:

StartLayout 1st Row 1st Column s 11 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis squared 2nd Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis squared plus left-parenthesis 22 minus 40 right-parenthesis squared plus left-parenthesis 50 minus 40 right-parenthesis squared right-parenthesis equals 244 comma 3rd Row 1st Column s 22 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis squared 4th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 3 minus 2 right-parenthesis squared plus left-parenthesis 1 minus 2 right-parenthesis squared plus left-parenthesis 2 minus 2 right-parenthesis squared right-parenthesis equals 1 comma 5th Row 1st Column s 12 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis 6th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis left-parenthesis 3 minus 2 right-parenthesis plus left-parenthesis 22 minus 40 right-parenthesis left-parenthesis 1 minus 2 right-parenthesis plus left-parenthesis 50 minus 40 right-parenthesis left-parenthesis 2 minus 2 right-parenthesis right-parenthesis equals 13 comma 7th Row 1st Column s 21 2nd Column equals s 12 period EndLayout

      Therefore,

bold upper S Subscript n Baseline equals Start 2 By 2 Matrix 1st Row 1st Column 244 2nd Column 13 2nd Row 1st Column 13 2nd Column 1 EndMatrix period

      A correlation matrix is a table showing correlation coefficients between variables. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The sample correlation between the ith and kth variables is defined as

      where

StartLayout 1st Row 1st Column s Subscript i k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p comma 2nd Row 1st Column s Subscript i i 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis squared comma i equals 1 comma 2 comma ellipsis comma p comma 3rd Row 1st Column s Subscript k k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis squared comma k equals 1 comma 2 comma ellipsis comma p period EndLayout

      (3.7)r Subscript i k Baseline equals StartFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis Over StartRoot sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis squared EndRoot StartRoot sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis squared EndRoot EndFraction

      for i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p. We note that the sample correlation is symmetric since r Subscript i k Baseline equals r Subscript k i for all i and k.

      The


Скачать книгу