Individual Participant Data Meta-Analysis. Группа авторов
of 999. Tumour stage was collected as a categorical variable with a single code for each stage and sub‐stage, and with a missing data code of 9. This afforded the greatest flexibility for subsequent analysis, as the sub‐stages could be used as supplied, or collapsed into broader‐stage categories as needed. While trial eligibility criteria indicate which participants a trial intends to recruit, it is worth suggesting a wider range of possibilities in the data dictionary, because recruitment of some ineligible participants might be inevitable. This could arise, for example, if eligibility is predicated on a positive diagnostic test, and false positives are identified at subsequent review, or as a result of a later diagnostic procedure. In the aforementioned cervical cancer IPD meta‐analysis, women with stage IVB stage were not eligible for any of the included trials. However, they were sometimes randomised erroneously, because initial clinical staging did not identify them as such, but subsequent surgical staging did, and so the data dictionary allowed for that possibility. If particular participant characteristics are collected on different scales, then it may be possible to convert to a common scale. In the cervical cancer IPD meta‐analysis, the included trials recorded performance status on different scales, so in the data dictionary it was made clear that all were permitted, and these were later converted into a common meta‐analysis scale.
Table 4.1 Excerpt from a data dictionary developed for an IPD meta‐analysis of chemoradiation for cervical cancer.93
Source: Claire Vale and Jayne Tierney.
Variable | Variable name | Definition |
---|---|---|
Age at randomisation | Age | Numeric Age in years 999 = unknown |
Tumour stage | TumStage | Numeric Tumour stage categories 1 = Stage Ia 2 = Stage Ib 3 = Stage IIa 4 = Stage IIb 5 = Stage IIIa 6 = Stage IIIb 7 = Stage IVa 8 = Stage IVb 9 = unknown |
Performance status | PerfStat | Numeric Provide the data as defined in the trial and supply full details of the system used |
Survival status | SurvStat | Numeric 0 = Alive 1 = Dead |
Date of death or last follow‐up | DOLF | Date in dd/mm/yy format unknown day = ‐‐/mm/yy unknown month = ‐‐/‐‐/yy unknown date = ‐‐/‐‐/‐‐ |
The data dictionary should use accepted coding conventions wherever possible, not only to facilitate the provision of data by trial teams, but also to avoid errors. For example, for binary and time‐to‐event outcomes, 0 is most commonly used to indicate no event, and 1 to indicate an event has happened. For time‐to‐event outcomes such as survival in cancer, or time free of seizures in epilepsy, it is important to collect the three component variables that make up the outcome for each participant (Table 4.1). These would comprise: a variable that indicates whether an event has happened (e.g. a death or a seizure); another that provides the date the event happened (e.g. date of death or date of seizure) and finally one that describes the date that the participant was last assessed for the outcome of interest (e.g. the date last seen in clinic). If an event has not occurred, the latter allows the participant to be included in the analysis, and censored at that time‐point. Together with the date of randomisation, these variables allow the time to event for each participant to be calculated, and provides the greatest flexibility for data checking (Section 4.5), risk of bias assessment (Section 4.6) and analysis (Part 2). Alternatively, the date of event and date of last follow‐up (censoring time) can be collected as a composite. As a bare minimum, the collection of an indicator variable for the occurrence of an event (yes/no) and the time to event (or censoring) will suffice. In fact, the latter may be all that trial teams are able to provide, for example, if they originate from a country or institute bound by stringent data protection regulations, or if the data are downloaded from a repository that prohibits the supply of exact dates in order to help to preserve participant confidentiality.
Special care is needed to avoid ambiguity in the data dictionary, otherwise it will lead to ambiguity in the supplied IPD from each trial, and then the IPD meta‐analysis database. For example, for an IPD meta‐analysis of the effects of anti‐platelet therapy for pre‐eclampsia in pregnancy,97 the data dictionary suggested that severe maternal morbidity be coded as a single variable. Unintentionally, this did not allow for the provision of more than one type of morbidity for an individual woman, which could occur, for example if she had eclampsia followed by a stroke (Table 4.2). In the same meta‐analysis, a missing data code of 9 was used for gestation at randomisation, which meant that (although unlikely) any women randomised at nine weeks’ gestation could potentially be regarded mistakenly as having missing gestation information (Table 4.2). Thus, an unambiguous missing data code such as 99 or, even better, a negative integer such as –9 would have been preferable. Furthermore, it is prudent to discriminate between different types of missing data, such as missing for the participant (e.g. –9 or 9), not applicable to the participant (e.g. –8 or 8) or not collected for the trial (e.g. –7 or 7). For example, in an IPD meta‐analysis of progesterone for pre‐term birth,79 if a baby was stillborn, certain baby outcomes were coded as 8 to signify that they could not be collected, and as 9 to indicate a true missing value. Although this could be inferred from the birth data, coding the IPD in this way made it easier to calculate the proportions of missing data and to cross‐check.
Table 4.2 Excerpt from a data dictionary developed for an IPD meta‐analysis project the effects of anti‐platelets for prevention of pre‐eclampsia in pregnancy97
Source: Lesley Stewart and Lisa Askie, based on the data dictionary used by Askie et al.97
Variable | Definition | Issue |
---|---|---|
Severe maternal morbidity | 1 = none 2 = stroke 3 = renal failure 4 = liver failure 5 = pulmonary oedema 6 = disseminated intravascular coagulation 7 = HELP syndrome 8 = eclampsia 9 = not recorded | Collection as a single variable did not allow for the provision of more than one morbidity for the same women |
Gestation at randomisation | Gestation in completed weeks9 = unknown | Woman could be randomised at 9 weeks gestation |
4.3 Initiating and Maintaining Collaboration
Negotiating and maintaining collaborations with trial investigators and organisations from different countries, settings and disciplines can take considerable time and effort, and requires careful management,43,44 but is critical to the success of collaborative IPD meta‐analysis projects. In an era where the value of clinical data sharing is more widely appreciated, persuading trial investigators of the value of participating is becoming easier. However, it is worth remembering that not all trial investigators will be obliged by their funders to share their IPD, and it is perfectly reasonable that they may require persuading