Real World Health Care Data Analysis. Uwe Siebert
missed paid work to help your care in last 12 months
3.5.2 Simulated PCI Data
The objective in simulating a new PCI data set from the observational data was primarily to produce a larger data set allowing us to more effectively illustrate the unsupervised, nonparametric Local Control alternative to conventional propensity score stratification (Chapter 7) and machine learning methods (Chapter 15). Starting from the observational data on 996 patients who received their initial PCI at Ohio Heart Health, Lindner Center, Christ Hospital, Cincinnati (Kereiakes et al, 2000), we generated this much larger data set via plasmode simulation. The simulated data set contains 11 variables on 15,487 patients with no missing values and is referred to as the PCI15K simulated data set. The key variables in the data set are described in Table 3.6. The treatment cohort for later analyses is represented by the variable THIN and the outcomes by SURV6MO (binary) and CARDCOST (continuous). As details of a process for generating simulated data was described for the REFLECTIONS example, only a brief summary and listing of the final simulated dataset variables are provided for the PCK15K dataset.
Table 3.6: PCI Simulated Data Set Variables
Variable Name | Variable Label |
patid | Patient ID number: 1 to 15487 |
surv6mo | Binary PCI Survival variable: 1 => survival for at least six months following PCI, 0 => survival for less than six months |
cardcost | Cardiac related costs incurred within six months of patient’s initial PCI; numerical values in 1998 dollars; costs were truncated by death for the 404 patients with surv6mo = 0 |
thin | Numeric treatment selection indicator: thin = 0 implies usual PCI care alone; thin = 1 implies usual PCI care augmented by either planned or rescue treatment with the new blood thinning agent |
stent | Coronary stent deployment; numeric, with 1 meaning YES and 0 meaning NO |
height | Height in centimeters; numeric integer from 133 to 198 |
female | Female gender; numeric, with 1 meaning YES and 0 meaning NO |
diabetic | Diabetes mellitus diagnosis; numeric, with 1 meaning YES and 0 meaning NO |
acutemi | Acute myocardial infarction within the previous 7 days; numeric, with 1 meaning YES and 0 meaning NO |
ejfract | Left ejection fraction; numeric value from 17 percent to 77 percent |
ves1proc | Number of vessels involved in the patient’s initial PCI procedure; numeric integer from 0 to 5 |
Tables 3.7 and 3.8 summarize the outcome data from the original data and the simulated Lindner data. Data are similar with slightly narrower group differences in the simulated data. In Chapters 7, 14, and 15, the PCI simulated data set is used for analysis and is named PCI15K.
Table 3.7: Lindner STUDY (Kereiakes et al. 2000)
Patients | Number Surviving Six Months | Percent Surviving Six Months | Average Cardiac Related Cost | |
Trtm = 0 | 298 | 283 | 94.97% | $14,614 |
Trtm = 1 | 698 | 687 | 98.42% | $16,127 |
Table 3.8: PCI Blood Thinner Simulation
Patients | Number Surviving Six Months | Percent Surviving Six Months | Average Cardiac Related Cost | |
Thin = 0 | 8476 | 8158 | 96.25% | $15,343 |
Thin = 1 | 7011 | 6925 | 98.77% | $15,643 |
3.6 Summary
In this chapter, two observational studies were introduced: the REFLECTIONS one-year study of patients with fibromyalgia and the Lindner study of patients undergoing PCI. The concept of plasmode simulations, where one builds a simulated data set that retains the same variables and correlation structure as the original data, was introduced and applied to the REFLECTIONS and Lindner data sets. SAS IML code for the application to the REFLECTIONS data was provided and was demonstrated to retain the similarities of the original data. These two data sets (simulated REFLECTIONS and PCI15K) are used throughout the remainder of the book to demonstrate the various methods for real world data analyses demonstrated in each chapter.
References
Austin P (2008). Goodness-of-fit Diagnostics for the Propensity Score Model When Estimating Treatment Effects Using Covariate Adjustment With the Propensity Score. Pharmacoepi & Drug Safety 17: 1202-1217.
Conover WG and Iman RL (1976). Rank Transformations in Discriminant Analysis.
Franklin JM, Schneeweis S, Polinski JM, Rassen J (2014). Plasmode simulation for the evaluation of pharacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal 72:219-226.
Gadbury GL, Xiang Q, Yang L, Barnes S, Page GP, Allison DB (2008). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates. PLoS Genet 4(6): e1000098.
Kereiakes DJ, Obenchain RL, Barber BL, Smith A, McDonald M, Broderick TM, Runyon JP, Shimshak TM, Schneider JF, Hattemer CH, Roth EM, Whang DD, Cocks DL, Abbottsmith CW (2000). Abciximab provides cost effective survival advantage in high volume interventional practice. American Heart J 140: 603-610.
Peng X, Robinson RL, Mease P, Kroenke K, Williams DA, Chen Y, Faries D, Wohlreich M, McCarberg B, Hann D (2015). Long-Term Evaluation of Opioid Treatment in Fibromyalgia. Clin J Pain 31: 7-13.
Robinson RL, Kroenke K, Mease P, Williams DA, Chen Y, D’Souza D, Wohlreich M, McCarberg B (2012). Burden of Illness and Treatment Patterns for Patients with Fibromyalgia. Pain Medicine 13:1366-1376.
Wicklin R (2013). Simulating Data with SAS®. Cary, NC: SAS Institute Inc.
Chapter 4: The Propensity Score
4.2.2 Address Missing Covariates Values in Estimating Propensity Score
4.2.3 Selection of Propensity Score Estimation Model
A Priori Logistic Regression Model