Computational Statistics in Data Science. Группа авторов

Computational Statistics in Data Science - Группа авторов


Скачать книгу
slash 2 Baseline StartFraction s Over StartRoot m EndRoot EndFraction"/>

      Confidence intervals are notoriously difficult to understand at a first instance, and thus a standard Monte Carlo experiment in an introductory statistics course is that of repeating the above experiment multiple times and illustrating that on average about left-parenthesis 1 minus alpha right-parenthesis proportion of such confidence intervals will contain the true mean. That is, for t equals 1 comma ellipsis comma n, we generate upper X Subscript t Baseline 1 Baseline comma ellipsis comma upper X Subscript t m Baseline tilde upper N left-parenthesis theta comma sigma squared right-parenthesis, calculate the mean upper X overbar Subscript t and the sample variance s Subscript t Superscript 2, and define p Subscript t to be

p Subscript t Baseline equals upper I left-brace upper X overbar Subscript t Baseline minus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction less-than theta less-than upper X overbar Subscript t Baseline plus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction right-brace

      where upper I left-brace dot right-brace is the indicator function. By the law of large numbers, p overbar equals n Superscript negative 1 Baseline sigma-summation Underscript t equals 1 Overscript n Endscripts p Subscript t Baseline right-arrow 1 minus alpha with probability 1, as n right-arrow infinity, and the following CLT holds:

StartRoot n EndRoot left-parenthesis p overbar minus left-parenthesis 1 minus alpha right-parenthesis right-parenthesis right-arrow Overscript d Endscripts upper N left-parenthesis 0 comma StartFraction alpha left-parenthesis 1 minus alpha right-parenthesis Over n EndFraction right-parenthesis

      In conducting this experiment, we must choose the Monte Carlo sample size n. A reasonable argument here is that our estimator p overbar must be accurate up to the second significant digit with roundoff. That is, we may allow a margin of error of 0.005. This implies that n must be chosen so that

n greater-than StartFraction alpha left-parenthesis 1 minus alpha right-parenthesis Over left-parenthesis 0.00 5 squared right-parenthesis EndFraction

      That is, to construct, say a 95 percent-sign confidence interval, an accurate Monte Carlo study in this simple example requires at least 1900 Monte Carlo samples. A higher precision would require an even larger simulation size! This is an example of an absolute precision stopping rule (Section 5 ) and is unique since the limiting variance is known. For further discussion of this example, see Frey [8].

      2.1 Expectations

      The most common quantity of interest in Monte Carlo simulations is the expectation of a function of the target distribution. Let double-vertical-bar dot double-vertical-bar denote the Euclidean norm, and let h colon script í’³ right-arrow double-struck upper R Superscript p, so that interest is in estimating

theta Subscript h Baseline equals integral h left-parenthesis x right-parenthesis upper F left-parenthesis normal d x right-parenthesis

      where we assume normal upper E Subscript upper F Baseline double-vertical-bar h left-parenthesis upper X right-parenthesis double-vertical-bar less-than infinity. If h is identity, then the mean of the target is of interest. Alternatively, h can be chosen so that moments or other quantities are of interest. A Monte Carlo estimator of theta Subscript h is

ModifyingAbove theta With Ì‚ Subscript h Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts h left-parenthesis upper X Subscript t Baseline right-parenthesis

      For IID and MCMC sampling, the ergodic theorem implies that ModifyingAbove theta With Ì‚ Subscript h Baseline right-arrow Overscript a period s period Endscripts theta Subscript h as n right-arrow infinity. The Monte Carlo average ModifyingAbove theta With Ì‚ Subscript h is naturally unbiased as long as the samples are either IID or the Markov chain is stationary.

      2.2 Quantiles

      Quantiles are particularly of interest when making credible intervals in Bayesian posterior distributions or making boxplots from Monte Carlo simulations. In this section, we assume that h is one‐dimensional (i.e., p equals 1). Extensions to p greater-than 1 are straightforward but notationally involved [10]. For upper <hr><noindex><a href=Скачать книгу