1 Clinical Trial Protocol Critical statistical issues: a regulatory view
Documentación
4 ICHE9 Statistical Principles for Clinical Trials ICHE9 CPMP/EWP/908/99 CPMP Points to Consider on Multiplicity issues in Clinical Trials (Apr 2003) CPMP/EWP/908/99 CPMP/EWP/2863/99 Points to Consider on Adjustment for Baseline Covariates (Nov 2003) CPMP/EWP/2863/99 CPMP/2330/99 Points to Consider on Application with 1.) Meta-analyses and 2.) One Pivotal study (May 2001) CPMP/2330/99 CPMP/EWP/2158/99 Guideline on the Choice of a Non-Inferiority Margin (Jan2006) CPMP/EWP/2158/99 CPMP/EWP/482/99 Points to Consider on Switching between Superiority and Non-inferiority (Feb 2001) CPMP/EWP/482/99 CPMP/EWP/1776/99 Points to Consider on Missing Data (Jan 2002) CPMP/EWP/1776/99 CHMP/EWP/83561/05 Guideline on Clinical Trials in Small Populations (Feb2007) CHMP/EWP/83561/05 CHMP/EWP/2459/02 Reflection Paper on Methodological Issues in Confirmatory Clinical Trials with Flexible Design and Analysis Plan CHMP/EWP/2459/02 Regulatory Guidances
5 Today’s talk is on statistics
6
Statistics Considerations
The role of statistics “Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.” The role of statistics. Pocock SJ. Br J Psychiat 1980; 137:
9 Key statistical issues Multiplicity Subgroups: interaction & confounding Superiority and non-inferiority (and ) Adjustment by covariates Missing data Others –Interim analyses –Meta-analysis vs one pivotal study –Flexible designs
MULTIPLICITY 10
11 Lancet 2005; 365: 1591–95 To say it colloquially, torture the data until they speak...
12 Torturing data… –Investigators examine additional endpoints, manipulate group comparisons, do many subgroup analyses, and undertake repeated interim analyses. –Investigators should report all analytical comparisons implemented. Unfortunately, they sometimes hide the complete analysis, handicapping the reader’s understanding of the results. Lancet 2005; 365: 1591–95
13 DesignConductionResults
14 Multiplicity K independent hypothesis : H 01, H 02,..., H 0K S significant results ( p< ) Pr (S 1 | H 01 H 02 ... H 0K = H 0. ) = 1 - Pr (S=0|H 0. ) = 1- (1 - ) K
15 Same examples
16Multiplicity Bonferroni correction (simplified version) –K tests with level of signification of –Each test can be tested at the /k level Example: –5 independent tests –Global level of significance=5% –Each test shoud be tested at the 1% level 5% /5=> 1%
17 But this is the simplified version for the general public
Cautionary Example RCT to treat rheumatoid arthritis Basic Clin Med 1981, 15: 445 Several end ‑ points repeated at various timepoints and various subdivisions 48 of these gave p-values < 0.05 But… expect 5% of 850 = 850/20 = 42.5 =>so finding 48 is not very impressive 18
19 Some strategies to ‘burden’ with multiple contrasts
20 Handling Multiplicity in Variables Scenario 1:One Primary Variable –Identify one primary variable -- other variables are secondary –Trial is positive if and only if primary variable shows significant (p < 0.05), positive results
21
22 Handling Multiplicity in Variables Scenario 2Divide Type I Error –Identify two (or more) co-primary variables –Divide the 0.05 experiment-wise Type I error over these co-primary variables, e.g., 0.04 for the 1st, and 0.01 for the 2nd co-primary variable –Trial is positive if at least one of the co-primary variables shows significant, positive results
23 Handling Multiplicity in Variables Scenario 3 Sequentially Rejective Procedure –Identify n co-primary variables, e.g., n = 3 –Order obtained p-values Interpret the variable with the highest p-value at the 0.05 level; if significant, then interpret the variable with the 2nd highest p-value at the 0.05/2 level; if positive, then interpret the variable with the smallest p-value at the 0.05/3 level. Test procedure stops when a test is not significant.
24 Handling Multiplicity in Variables Scenario 4Hierarchy –Prespecify hierarchy among n co-primary variables, –All tested at the same level interpret 1st variable at 0.05 level, if significant, then interpret 2nd variable at 0.05 level; if positive, then interpret 3rd variable at 0.05 level. … Test procedure stops when a test is not significant. –Trial is positive if first co-primary variable shows significant, positive result
25 Secondary Variables Secondary Variables Secondary variables can only be claimed if and only if –the primary variable shows significant results, and –the comparisons related to the secondary variables also are protected under the same Type I error rate as the primary variable. Similar procedures as already discussed can be used to protect Type I error
26 Handling Multiplicity in Treatments Similar procedures as how to handle multiplicity in variables. Additional procedures are available, mainly geared to very specific settings of the statistical hypotheses. –Dunnett, Scheffee, REGW, Williams …
SUBGROUPS 27
Subgroups Indiscriminate subgroup analyses pose serious multiplicity concerns. Problems reverberate throughout the medical literature. Even after many warnings, some investigators doggedly persist in undertaking excessive subgroup analyses. Lancet 2000; 355: 1033–34 Lancet 2005; 365: 1657–61 28
29 Interacción Edad < 45 años Edad >= 45 años d=5 % d=0.7% d=11.5%
30 Factores de confusión No fumadores Fumadores d=6% d=0%
31 Subgroups & Simpson’s Paradox
32 Subgroups & Simpson’s Paradox cont.
33 Subgroups AspirinPlacebo Vascular Death Total %10.2% p= d=-0.9 ISIS-2: Vascular death by Star signs Geminis/LibraOther Star Signs AspirinPlacebo Vascular Death Total % 12.1% p<0.0001d=3.1 Interacction p = Lancet 1988; 2: 349–60.
34 Changes from ISIS-2 results Lancet 2005; 365: 1657–61
35 “The answer to a randomized controlled trial that does not confirm one’s beliefs is not the conduct of several subanalyses until one can see what one believes. Rather, the answer is to re- examine one’s beliefs carefully.” –BMJ 1999; 318: 1008–09.
36 Lancet 2005; 365: 1657–61
the question is NOT: ‘Is the treatment effect in this subgroup statistically significantly different from zero?’ BUT… are there any differences in the treatment effect between the various subgroups? The correct statistical procedures are either a test of heterogeneity or a test for interaction
38 Subgroups Recommendations: –1) Examine the global effect –2) Test for the interaction –3) Plan adjustments for confirmatory analyses –4) Some points which increase the credibility: Pre-specification Biologic plausibility
39 Lancet 2005; 365: 176–86
MULTIPLE INSPECTIONS 40
41 Interim Analyses in the CDP Z Value Month of Follow-up (Month 0 = March 1966, Month 100 = July 1974) Coronary Drug Project Mortality Surveillance Circulation. 1973;47:I-1 clinicaltrials.gov/ct/show/NCT ;jsessionid=C4EA2EA9C F 8CAB6AFB723820A?order=23
42 Lancet 2005; 365: 1657–61
43 CPMP/EWP/482/99: PTC on Switching between Superiority and Non- Inferiority & CPMP/EWP/2158/99: PtC on the Choice of Delta
44 ENSAYOS DE NO-INFERIORIDAD NECESIDAD Implicaciones legales. Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de placebo. Limitaciones prácticas a la superioridad frente a control activo. Necesidad de información comparativa. Posibles valores añadidos.
45
46 ENSAYOS DE NO-INFERIORIDAD NECESIDAD Implicaciones legales. Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de placebo. Limitaciones prácticas a la superioridad frente a control activo. Necesidad de información comparativa. Posibles valores añadidos.
47 Aproximación con el Poder (prueba clásica + cálculo del poder)
48 ENSAYOS DE NO-INFERIORIDAD NECESIDAD Implicaciones legales. Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de placebo. Limitaciones prácticas a la superioridad frente a control activo. Necesidad de información comparativa. Posibles valores añadidos.
49
50 ENSAYOS DE NO-INFERIORIDAD NECESIDAD Implicaciones legales. Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de placebo. Limitaciones prácticas a la superioridad frente a control activo. Necesidad de información comparativa. Posibles valores añadidos.
51 Lancet 2001,356:
52 ENSAYOS DE NO-INFERIORIDAD NECESIDAD Implicaciones legales. Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de placebo. Limitaciones prácticas a la superioridad frente a control activo. Necesidad de información comparativa. Posibles valores añadidos.
53 Valores añadidos Posología: 1 vez al día Vía: vía oral Seguridad: Acontecimientos adversos Poblaciones especiales: Ancianos, pediatría Interacciones
54 Ensayos de Equivalencia Ensayos de bioequivalencia (producto genérico vs comercializado) Nuestro producto no es peor y puede presentar otras ventajas (seguridad, comodidad posológica …) –No-inferioridad
55 ESTUDIO DE SUPERIORIDAD d > 0 + efecto IC95% d = 0 No hay diferencia d < 0 - efecto Mejor TestMejor Control
56 ESTIMACIÓN POR INTERVALO (ESTUDIO DE SUPERIORIDAD) Es estadísticamente significativa d = 0 No hay diferencia d < 0 - efecto d > 0 + efecto IC95% Mejor TestMejor Control
57 ESTIMACIÓN POR INTERVALO (ESTUDIO DE SUPERIORIDAD) Es estadísticamente significativa con P=0,05 (justo en el límite) d > 0 + efecto d = 0 No hay diferencia d < 0 - efecto IC 95% Mejor TestMejor Control
58 ESTUDIO DE EQUIVALENCIA d > 0 + efecto d = 0 No hay diferencia d < 0 - efecto -d +d Región de equivalencia clínica Delta: ( ) mayor diferencia sin relevancia clínicamayor diferencia sin relevancia clínicao la menor diferencia con relevancia clínicala menor diferencia con relevancia clínica
59 EQUIVALENCIA 0++ -- Equivalencia No equivalencia
60 NO-INFERIORIDAD TERAPÉUTICA No-Inferioridad No No- Inferioridad 0-- Mejor TestMejor Control
% B A P 1/2 ? 1/3 ?
62
63
%
RANDOMIZATION & COVARIATES 65
66 Adjustement The objective should be not to compensate unbalance (randomisation) but to improve the precision Avoid to adjust by post-randomization variables In RCT, never use this widespread strategy: “adjust by any baseline significant variable (5% or 10% level)”
Stratification A priori May desire to have treatment groups balanced with respect to prognostic or risk factors (co- variates) For large studies, randomization “tends” to give balance For smaller studies a better guarantee may be needed Useful only to a limited extent (especially for small trials) but avoid to many variables (i.e. many empty or partly filled strata) 67
Testing for “baseline homogeneity” All observed differences are known with certainty to be due to chance. We must not test for it: there is no alternative hypothesis whose truth can be supported by such a test. If significant, the estimator is still unbiased Balance: –Decreases the variance and increases the power. –It has no effect on type I error. 68
69 Observed Unbalanced… NEVER justifies the post-hoc adjustment: –Randomization is more important –The treatment effect is unbiased without adjustment (randomization) –Type I error level takes into account for “chance error” –Post-hoc: data driven analyses –Multiplicity issues : increase type I error by allowing a post-hoc adjustment
70 Adjusted Analyses ‘ When the potential value of an adjustment is in doubt, it is often advisable to nominate the unadjusted analysis as the one for primary attention, the adjusted analysis being supportive.’
71 Ajuste por covariables Definición a priori La aparición de desigualdades basales NO justifica el ajuste per se: –Se da más importancia a la randomización –Peligro de los análisis post-hoc –Multiplicidad Como estrategia general, el ajuste por variables significativas basales (ej, p<0.1 o p<0.05) a priori: NO es válida
MISSING DATA 72
73 Ex: LOCF & lineal extrapolation Time (months) LOCF Lineal Regresion Bias Adas-Cog > Worse < Better
74 Ex: Early drop-out due to AE Adas-Cog Time (months) Placeb o Active > Worse < Better Bias: Favours Active
75 Ex: Early drop-out due to lack of Efficacy Adas-Cog Time (months) Placeb o Active > Worse < Better Bias: Favours Placebo
76 RND B Baseline Last Visit ≠ Frecuencies A Drop-outs and missing data AAAA AA B B A Visit 2 Visit 1 A
77 RND Baseline Last Visit ≠ Timing A Drop-outs and missing data AAAAB B Visit 2 Visit 1 BBB
78 Handling of MD Methods for imputation: –Many techniques –No gold standard for every situation –In principle, all methods may be valid: Simple methods to more complex: –From LOCF to multiple imputation methods –Worst Case, “Mean methods” Multiple Imputation But their appropriateness has to be justified Statistical approaches less sensitive to MD: –Mixed models –Survival models They assume no relationship between treatment and the missing outcome, and generally this cannot be assumed.
CONCLUSION 79
80
81
Gracias por su atención!! 83