The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira

The Statistical Analysis of Doubly Truncated Data

also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data applied for

[ISBN: 9781119951377]

Cover Design: Wiley

To María Soledad, Marcos, Paula and Miguel, for their love, support and inspiration.

To Teo, Sabela and Andrés, for their infinite patience.

Preface

This book is the result of a long‐standing collaboration among the three authors, which began when Carla Moreira was a PhD student under the supervision of Jacobo de Uña‐Álvarez. Carla successfully defended her thesis, entitled ‘The Statistical Analysis of Doubly Truncated Data: New Methods, Software Development, and Biomedical Applications’, at the Universidade de Vigo in July 2010. At that time, only a small number of people seemed to be aware of the importance of random double truncation. Research papers on this topic were scarce before 2010, with the contribution by Bradley Efron and Vahe Petrosian in 1999 as the most relevant one. And, of course, no software was available. So, for us, it was a risky and exciting research exercise to embrace such an initiative.

We launched version 1.1 of our R package DTDA in September 2009. To our knowledge this was the first software library implementing the Efron–Petrosian estimator. The package included Efron and Petrosian's data on quasar luminosities, and we are very thankful to both scientists for sharing them. DTDA has been downloaded more than 45 thousand times up to now. We have taken the opportunity of writing this book to update and enhance DTDA, feeding it with new illustrative real datasets and enabling new functions and capabilities. We are confident in that the update of the package and the guidance provided by this book will exponentially increase the applications involving doubly truncated data, and also raise awareness about the implications of double truncation on inferential procedures.

Over these years, several researchers have collaborated with us in the fascinating adventure of investigating double truncation. Among them, we would like to mention Ingrid Van Keilegom, Micha Mandel, Rebecca Betensky, Luis Meira‐Machado and Roel Braekers. We have enjoyed co‐authoring a number of research papers with them. We also learned a lot about double truncation by studying real data problems posed by applied researchers; here we thank María José Bento, David Keith Simon, Zhi‐Sheng Ye, Ana Cristina Santos and Henrique Barros for fruitful discussions and cooperation.

Nowadays, there is a considerable statistical community doing research on exploratory and inferential methods for doubly truncated data, partly motivated by new emerging applications in Biomedicine, Economics and Engineering, among other fields. At the time of writing the activity in this area of research is much more intense than ever before, as is evident from the number of papers on the topic published in the last couple of years. And the interest in double truncation is growing faster and faster!

This book aims to serve as a companion for those ones interested in learning about doubly truncated data analysis and inference, presenting a wide range of tools for estimating distribution and regression models. All the methods presented in this book are accompanied by real data and simulated examples and, at the end of each chapter, the reader will find the do‐it‐yourself code, mostly based on the DTDA package. This book is not written with the aim of being just read: its main purpose is to invite the reader to think, explore and experience.

This volume is also self‐contained, providing a general overview on the main results. Further technical details and some omitted proofs can be consulted in the original references. It is also in our intention to leave several take‐home messages. First, that the correction of the potential sampling bias arising from double truncation may be critical in estimation and inference. Second, that, even when the Efron–Petrosian estimator is conceptually complicated and its asymptotic theory may be overwhelming, its practical application is relatively simple from the available software packages and the good performance of resampling algorithms. Third, that external information on the sampling bias should be used whenever available, since the Efron–Petrosian estimator may be very noisy or even non‐existing, particularly when the sample size is small to moderate.

We frankly hope that the reader will enjoy (and experience!) the book, at least as much as we have enjoyed writing it! Comments and suggestions from the readers on this edition are welcome; please send them to [email protected] to help us to improve the book.

Parts of this book were written while the authors were supported by the Grants MTM2017‐89422‐P (MINECO/AEI/FEDER, UE) (first author), UIDB/00013/2020 and UIDP/00013/2020 (second author), and MTM2016‐76969‐P (MINECO/AEI/FEDER, UE) (third author). This is acknowledged.

May 2021

Jacobo de Uña‐Álvarez, Carla Moreira and Rosa M. CrujeirasVigo, V. N. Famalicão and Santiago de Compostela

1 Introduction

1.1 Random Truncation

Random truncation generally refers to a situation in which a number of individuals of the target population cannot be sampled because a certain random event precludes them. When this random event is unrelated to the variables of interest standard statistical methods apply, with the only inconvenience of using a smaller sample size. In many practical cases, however, the truncation event is related to the variables under study, and specific methods to overcome the sampling bias must be considered.

This book is focused on random truncation phenomena that arise (usually, but not only) when sampling time‐to‐event data. That is, the variable of interest is the time

elapsed from a well‐defined origin to another well‐defined end point. In this setting, a truncated sample of

is a set of independent and identically distributed (iid) random variables

with the conditional distribution of

given

, where Скачать книгу

The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira