Interventional Cardiology. Группа авторов
that key baseline features will not influence the treatment effect, randomization can also be stratified, a common approach in large multicenter studies.
In addition, randomization helps to ensure that all other aspects of patient care, and also the evaluation of patient outcome, is identical in both treatment groups. In this respect it is often important to make the trial double blind, whereby neither patients nor those treating them and evaluating their response know which treatment each individual patient is receiving.
If a trial cannot be made double blind – a relevant issue in interventional cardiology trials unless sham procedure are not considered – one can nevertheless require blinded evaluation of outcome by people not aware of which treatment each patient is on.
Trial size and power calculations
For a trial to provide a reliably precise answer as to the relative merits of the randomized treatments one needs a sufficiently large number of patients. Power calculations are the most commonly used statistical method for determining the required trial size.
Each power calculation entails the following five steps, itemized in Table 6.3:
1 Choose a primary outcome for the trial.
2 Decide on a level of statistical significance required for declaring a “positive” trial. Five percent significance is usually chosen.
3 Declare what you expect the control groups results to be.
4 Declare the smallest true treatment difference that is important to detect. Large treatment effects, if present, can be detected in relatively small trials so it is relevant to focus on what reasonably modest effect one would not wish to miss.
5 Declare with what degree of certainty (statistical power) one wishes to detect such a difference as statistically significant. From such information there are statistical formulae that provide the required number of patients.
Table 6.3 Key components of sample size/power calculations.
Component | Comments |
---|---|
Outcome type | Proportion; time to event; mean |
Type I error (alpha) | Level of significance to declare a “significant” result. Typically 0.05 |
Control group rate | Risk for events in non‐experimental arm |
Meaningful difference | Smallest true difference with clinical impact |
Type II error (beta) | Probability of declaring no difference when in fact one exists. Typically, 0.1 or 0.2. Power = 1 – Beta |
It is important to note that sample size is estimated in the design phase of a study using a priori assumptions that may or may not end up being correct. The implications of incorrect assumptions are not trivial. Poor design can result in an underpowered study that is unable to demonstrate reductions with a treatment effect that is in fact beneficial, thereby depriving patients of a therapeutic option. Alternatively, poor enrolment or event rate assumptions that are not realistic can result in significant expenditure of both human and financial resources in the execution of a study that is ultimately futile. Appreciating the nuances of sample size calculations is critical to the interpretation of clinical trial results, both positive and negative. Table 6.4 provides several examples of trials that were either under‐ or overpowered based on initial assumptions.
Table 6.4 Impact of incorrect sample size assumptions on study power.
Component of power calculation | Assumption compared to actual | Effect on power | Example |
---|---|---|---|
Sample size | Lower than expected | Reduced | VA CARDS |
Detectable difference | Higher than expected | Increased | FAME 2 |
Event rate | Lower than expected | Reduced | GRAVITAS |
In the Coronary Artery Revascularization in Diabetes (VA CARDS) trial [5], investigators designed a multicenter randomized trial comparing CABG with PCI in patients with DM and CAD. The trial required 790 patients to yield 90% power to detect a 40% reduction in the primary endpoint. However, the trial was stopped early because of slow enrolment, after enrolling only 198 patients. The CI for the treatment effect was very wide, 0.47–1.71, and although this included the detectable difference for which the study was powered (RR 0.6), the small sample size rendered the results imprecise and non‐significant. In contrast, in Fractional Flow Reserve versus Angiography for Multivessel Evaluation 2 (FAME 2) [6], De Bruyne et al. compared revascularization versus medical therapy in patients with stable CAD and fractional flow reserve (FFR) values ≤0.8. The study assumed an event rate of 18.0% in the control arm, relative risk reduction of 30%, and 816 patients per group to provide 84% power. Although the event rate assumption in the control arm was close to actual (19.5%), the study was halted after only 54% of projected enrolment because of a much larger than expected relative risk reduction of 61%. Finally, Price et al. designed the Gauging Responsiveness with A VerifyNow assay‐Impact on Thrombosis And Safety (GRAVITAS) trial to examine the impact of standard vs high‐dose clopidogrel on reducing 6‐month outcomes in patients with high on‐treatment platelet reactivity [7]. The investigators assumed a 6‐month event rate of 5.0%, risk reduction of 50%, and a sample size of 2200 to provide 80% power. Although the trial enrolled the required sample size, event rates were only 2.3% in each group, yielding a non‐significant and imprecise treatment effect of 1.01 (0.58–1.76). Often, a single clinical trial is neither large nor representative enough to evaluate a particular therapeutic issue. Then, meta‐analyses can be of value in combining evidence from several related trials to reach an overall conclusion, provided that these trials share similar design, population, endpoint definition and follow‐up.
Additional topics in clinical design and analysis
Superiority and non‐inferiority designs
This chapter so far has discussed the fundamentals of trial design and statistical analysis with the so‐called frequentist approach. Clearly there are many other important issues that need to be tackled in the design, conduct, analysis, and interpretation of clinical trials. All we can do here is briefly alert the reader to these topics and encourage them to pursue further from other courses, textbooks, publications, and so on.
In trial design we have concentrated on parallel group trial with just two treatments. In this context the most common trial types include superiority and non‐inferiority designs. The key difference between these trial types relates to the expression of the null and alternative hypotheses for each respective design. In a classic superiority trial, the null hypothesis states that there are no differences between the experimental and control treatments, whereas in a noninferiority trial the null hypothesis is formulated as the experimental treatment is worse than control by a pre‐specified margin. Similarly, the alternative hypothesis for a superiority trial assumes that the experimental and control treatments