The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira

The Statistical Analysis of Doubly Truncated Data

and Meier, 1958), which corrects for the fact that some of the recorded values for

are smaller than the true ones. With truncated data, every value in the sample corresponds to a true observation of

; however, the distribution of the observed values may be shifted with respect to the true one due to the truncation event. This difference between truncation and censoring suggests that specific methods to estimate the target distribution under random truncation should be employed. Indeed, Woodroofe (1985) provides a deep analysis of one‐sided truncation, introducing the original idea of Lynden–Bell (1971) as a nonparametric maximum likelihood estimator (NPMLE) of the probability distribution in that setting. The estimator in Woodroofe (1985) is a particular case of the estimator corresponding to doubly truncated data, on which this book is focused.

1.3 Double Truncation

A variable of interest

is said to be doubly truncated by a couple of random variables

if the observation of

is possible only when

occurs. In such a case,

and

are called left‐ and right‐truncation variables respectively. Double truncation reduces to left‐truncation when

degenerates at

, while it corresponds to right‐truncation when

. This book is focused on the problem of estimating the distribution of

, and other related curves, from a set of iid triplets with the distribution of

given

There are several scenarios where double truncation appears in practice. One setting leading to double truncation is that of interval sampling, where the sample is restricted to the individuals with event between two specific dates

and

(Zhu and Wang, 2012). Then, the right‐truncation time is

, where

denotes the date of onset for the time‐to‐event, and the left‐truncation time is

, where

is the interval width. The Childhood Cancer Data in Section 1.4.1 is an example of data obtained through interval sampling.

With interval sampling the variable

is degenerated at

. This occurs in other sampling schemes too, in which

and

are certain subject‐specific event dates. An illustrative example is given by the Parkinson's Disease Data, see Section 1.4.5, where

is the individual age at blood sampling. When

is constant, the couple

falls on a line, and its joint density does not exist, even when the truncating variables may be continuous.

In other situations, the truncating variables

and

are not linked through the linear equation

. For example,

and

could represent some random observation limits beyond which the variable of interest

can not be sampled or detected. Situations like this occur for example in Astronomy, as it is illustrated in Section 1.4.4.

With random double truncation, both large and small values of

are observed in principle with a relatively small probability. However, the real observational bias for

varies from application to application, depending on the joint distribution of

. We will see, for example, that the probability of sampling a value

, namely

, may be roughly constant, inducing no observational bias; or that it may be roughly decreasing, indicating the dominance of the right‐truncation bias relative to the left‐truncation bias.

Another issue of relevance is that of the identifiability of the distribution of

. Intuitively it is clear that with doubly truncated data it is only possible to estimate the distribution of

conditional on

, where

and Скачать книгу