The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira
and Meier, 1958), which corrects for the fact that some of the recorded values for are smaller than the true ones. With truncated data, every value in the sample corresponds to a true observation of ; however, the distribution of the observed values may be shifted with respect to the true one due to the truncation event. This difference between truncation and censoring suggests that specific methods to estimate the target distribution under random truncation should be employed. Indeed, Woodroofe (1985) provides a deep analysis of one‐sided truncation, introducing the original idea of Lynden–Bell (1971) as a nonparametric maximum likelihood estimator (NPMLE) of the probability distribution in that setting. The estimator in Woodroofe (1985) is a particular case of the estimator corresponding to doubly truncated data, on which this book is focused.
1.3 Double Truncation
A variable of interest
is said to be doubly truncated by a couple of random variables if the observation of is possible only when occurs. In such a case, and are called left‐ and right‐truncation variables respectively. Double truncation reduces to left‐truncation when degenerates at , while it corresponds to right‐truncation when . This book is focused on the problem of estimating the distribution of , and other related curves, from a set of iid triplets with the distribution of given .There are several scenarios where double truncation appears in practice. One setting leading to double truncation is that of interval sampling, where the sample is restricted to the individuals with event between two specific dates
and (Zhu and Wang, 2012). Then, the right‐truncation time is , where denotes the date of onset for the time‐to‐event, and the left‐truncation time is , where is the interval width. The Childhood Cancer Data in Section 1.4.1 is an example of data obtained through interval sampling.With interval sampling the variable
is degenerated at . This occurs in other sampling schemes too, in which and are certain subject‐specific event dates. An illustrative example is given by the Parkinson's Disease Data, see Section 1.4.5, where is the individual age at blood sampling. When is constant, the couple falls on a line, and its joint density does not exist, even when the truncating variables may be continuous.In other situations, the truncating variables
and are not linked through the linear equation . For example, and could represent some random observation limits beyond which the variable of interest can not be sampled or detected. Situations like this occur for example in Astronomy, as it is illustrated in Section 1.4.4.With random double truncation, both large and small values of
are observed in principle with a relatively small probability. However, the real observational bias for varies from application to application, depending on the joint distribution of . We will see, for example, that the probability of sampling a value , namely , may be roughly constant, inducing no observational bias; or that it may be roughly decreasing, indicating the dominance of the right‐truncation bias relative to the left‐truncation bias.Another issue of relevance is that of the identifiability of the distribution of
. Intuitively it is clear that with doubly truncated data it is only possible to estimate the distribution of conditional on , where and