skip to Main Content
+44 (0) 20 3828 6228, +44 (0) 20 3816 9524

The impact of trial design on network meta-analysis and decision-making: a working example in ulcerative colitis

Introduction and methods

Evidence-based healthcare decision-making requires comparison of all relevant competing interventions in the same patient population. Network meta-analysis (NMA) is a method of indirect comparison, permitting simultaneous comparison of multiple treatments. It is commonly used in health technology assessment (HTA) in the absence of head to head trial data for all comparators of interest. Despite widespread use of the NMA approach, it is sometimes unsuitable because of heterogeneity in trial design.

This article uses ulcerative colitis (UC) as a working example to discuss this problem and the implications it had for assessing drug efficacy in the appraisal of vedolizumab [Entyvio™, Takeda (NICE [TA342], 2015)]. UC is a chronic relapsing-remitting form of inflammatory bowel disease (IBD). Vedolizumab is a gut-selective targeted therapy that is indicated for treatment of adult patients with moderately to severely active UC.

The manufacturer of vedolizumab performed an NMA using data from the randomised controlled trials (RCTs) detailed in Table 1. Relevant RCT data was extracted and assessed alongside structural elements of placebo-controlled trials, represented in Figure 1.

Table 1: Trials included in the manufacturer’s NMA

*Exact numbers of TNF inhibitor-naïve patients were not given.
Manufacturer’s trial in bold text
ITT, intention to treat; SC, subcutaneous; TNF, tumour necrosis factor.

Figure 1: Network diagram of RCTs used in the NMA

ADA, adalimumab; GOL, golimumab; INF, infliximab; NMA, network meta-analysis; PBO, placebo; RCT, randomised controlled trial; TNF, tumour necrosis factor; VED, vedolizumab.

The manufacturer used a fixed effect model in their NMA but identified the following key differences in trial design:

1.     Protocol-defined induction duration.

2.     Protocol-defined maintenance duration.

3.     Whether re-randomisation was performed after induction.

4.     Inclusion or exclusion of tumour necrosis factor (TNF) inhibitor use.

These differences are highlighted in bold in the GEMINI trial design shown in Figure 2. Additionally, the ERG commented on the choice of model used in the NMA:

5.     Fixed-effect versus random effect model.

Figure 2: GEMINI trial design, bold text indicates differences in design between other trials

*Clinical response defined as: a reduction in the Mayo score of at least 3 points and a decrease of at least 30% from baseline, with an accompanying decrease in the rectal bleeding sub score of at least 1 point or an overall rectal bleeding sub score of 1 point or less.
**Mayo score: included assessment of stool frequency, rectal bleeding, an endoscopic assessment and a global assessment by a clinician.
PBO, Placebo; TNF, tumour necrosis factor; VED, vedolizumab


Sections 1 – 4 detail how the manufacturer adjusted for differences in trial design, the ERG response to these approaches and the resulting NICE committee decision. Section 5 discusses the ERG comments on fixed- and random- effect models and their appropriateness.

1.     Protocol-defined induction duration

In the GEMINI and PURSUIT trials, clinical response was measured at 6 weeks, compared with 8 weeks in the ULTRA 1 and ACT 1 trials. Due to the different dosing schedules, this meant that over 6 weeks patients received:

·       two doses of vedolizumab and golimumab

·       four doses of adalimumab

·       three doses of infliximab

The evidence review group (ERG) stated that 10 weeks should be used to assess the clinical effectiveness during induction to maintain parity between treatments. NICE noted that by assessing clinical response at 6 weeks the efficacy of vedolizumab may have been underestimated, as patients would have three or four doses of infliximab and adalimumab respectively.

2.     Protocol defined maintenance duration

Maintenance duration was 52 weeks in the GEMINI, ULTRA 2 and Suzuki (2014) trials compared with 54 weeks in the ACT 1 and PURSUIT-SC/M trials. The NICE committee considered this to have no impact on the results.

3.     Whether re-randomisation was performed after induction

In the GEMINI and PURSUIT-SC/M trials (vedolizumab and golimumab), patients were re-randomised after the induction stage, whereas in the ACT 1/2; ULTRA 1/2 and Suzuki (2014) trials (adalimumab and infliximab), the patients were only randomised at baseline.

The ERG stated that re-randomising only patients who responded at 6 weeks meant late responders were excluded; potentially underestimating the efficacy of vedolizumab. Equally, by only selecting responders, this may have generated better results. The ERG concluded that it was not clear whether the results in GEMINI or PURSUIT-M over- or underestimated the treatment effect of vedolizumab relative to the comparators in the maintenance phase.

4.     Inclusion or exclusion of TNF inhibitor use

Once TNF inhibitors have failed, re-treatment is associated with a lower rate of response. To account for this, the manufacturer performed separate NMAs for the TNF inhibitor naïve and prior TNF inhibitor failure sub-groups, shown in Figure 3 (there were no data for TNF inhibitor failure for infliximab or golimumab).

Figure 3: Clinical responders from GEMINI and subgroup analysis

ADA, adalimumab; GOL, golimumab; INF, infliximab; PBO, placebo; TNF, tumour necrosis factor; VED, vedolizumab.

The ERG noted that a disadvantage of not comparing all subgroups together was that interaction between treatment and subgroup could not be explored. The ERG claimed this could have been done using meta-regression. The manufacturer indicated that the study was not powered for these assessments as there was an insufficient number of trials included in the networks. The ERG counterargued the manufacturer should present the predictive distribution of mean treatment effect instead. This incorporates extra uncertainty caused by potential differences between studies.

Despite this, the NICE committee concluded that vedolizumab was clinically effective in the entire population, and in both subgroups, compared with conventional therapy.

5.     Fixed effect versus random effect model

The manufacturer using a fixed effect model in their NMA but the ERG stated that a random effects model would have been more appropriate than a fixed effect model. Fixed effect models cannot capture the uncertainty in treatment effect due to differences in trial design, thus risk underestimation of uncertainty. However, it is a common finding in drug comparisons that random effects models generate very wide confidence intervals making it difficult to reach conclusions on comparative effectiveness that reflect heterogenous data.

Whilst the fixed effect NMA was still assessed, NICE recommended vedolizumab largely based on tolerability and patient QoL considerations related to corticosteroids and invasive surgery, rather than the comparability of vedolizumab to TNF inhibitor alternatives for treating this patient group.


Although NMA is important for HTA purposes it assumes no significant trial heterogeneity. While HTA guidance has been published on which type of model should be used within an NMA, often the tests do not decisively favour a single model so considerations are made on a case-by-case basis. Where trial designs differ, random effects models should ideally be used, but to gain insights into clinical effectiveness using these models, large data sets are required. Given that clinical trials are frequently based on very specific patient populations, it is rare that there would be enough data from comparator trials to gain significant results from such models. Where trial and patient population heterogeneity is high, fixed effect models may provide a significant result in terms of clinical effectiveness, but the lack of adjustment for differences in trial design mean fixed effect models could under or over-estimate the clinical effectiveness of a drug. By contrast, random effect models do account for differences in trial design but often produce results with such wide confidence intervals that they provide limited certainty with regards to clinical effectiveness. Given the prevalence of trial and patient population heterogeneity, this issue is frequently discussed in HTAs.


Discussions regarding appropriate NMA methods continue among HTA organisations and recommendations continue to evolve. Fixed effects models are routinely used, however when comparing efficacy among trials that differ considerably in design, a fixed effects NMA may not be appropriate, or help HTA organisations in decision-making.


·       NICE [TA342] (2015)



·       Sandborn et al. (2014) Gastroenterology. 146:96-109

·       Sandborn et al. (2014) Gastroenterology. 146: 85-95

·       Reinisch et al. (2011). Gut. 60: 780-787

·       Sandborn et al. (2012) Gastroenterology. 142: 257-265

·       Suzuki et al. (2014) Gastroeneterology. 49: 283-294

·       Rutgeerts et al. (2005) NEJM. 353: 2462-2476.