The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira
This dataset is used in Chapters 2, 3 and 5 and is accessible in the DTDA
package in ChildCancer
.
1.4.2 AIDS Blood Transfusion Data
Kalbfleish and Lawless (1989) reported 494 cases of transfusion‐related AIDS, corresponding to individuals diagnosed prior to 1 July 1986 (
). The variable of ultimate interest is the induction or incubation time, which is the time elapsed from HIV infection to AIDS. Importantly, HIV was unknown before 1982 (); this implies that cases developing AIDS prior to this date were not reported. Let denote the time from HIV infection to 1 July 1986 (in months), and introduce ; then, due to the interval sampling, only triplets satisfying were observed (Bilker and Wang, 1996). We restrict our analysis to the cases with consistent data, for which the infection could be attributed to a single transfusion or a short series of transfusions. This dataset is fully reported in Kalbfleish and Lawless (1989), p. 361.The observed values of
range from 0.5 to 89 (months), while ranges from to 45.5. This suggests that the lower limit of the support of is about , while the upper limit of the support of is about 99.5. As discussed in Chapter 2, in such a case the distribution of the incubation time is identifiable on the interval (months). The AIDS Blood Transfusion Data also includes information on the age of the individual at infection; see Table 1.2.Table 1.2 Descriptive statistics for the AIDS Blood Transfusion Data: sample size
and mean (and standard deviation, SD) for the incubation time (months) by age at infection.Age group | Mean (SD) | |
---|---|---|
30 years | 56 | 27.09 (18.28) |
30–60 years | 104 | 33.80 (18.95) |
60 years | 135 | 32.46 (16.74) |
This dataset is used in Chapters 2, 3, 4 and 5 and can be obtained from AIDS.DT
in DTDA
package.
1.4.3 Equipment‐ S
Rounded Failure Time Data
Companies are often interested in estimating the time to failure of their devices after installation. For doing this, maintenance departments may register events of failure between two specific dates
and for the units installed in the field. This field lifetime distribution is, however, doubly truncated because of the interval sampling. The Equipment‐S
data (Ye and Tang, 2016) concern failures of a certain device (details are not given due to confidentiality issues) recorded between 1996 and 2011, a 15 year long observational window. Information on the date of installation and the date of failure, rounded to years, was obtained by digitizing Figure 2 in the referred paper. This dataset is therefore a discrete version of the original data in Ye and Tang (2016). In this example the right‐truncation time is the number of years between installation and 2011, while the left‐truncation time is just . In Table 1.3 the Equipment‐ S
failure times are summarized.
The observable range for the Equipment‐ S
failure times goes from zero to 34 years, which is the maximum observed value for the right‐truncating variable