Avoiding data pitfalls in clinical research

Genomics sequencing and sensors will present new challenges

Data

The European Medicines Agency released draft guidance last year to offer new thinking on how the industry can incorporate genetic factors into understanding disease and therapeutic response. The agency’s draft position paper signals the importance genomics data will play in clinical development, so it’s important to understand how it can be integrated into traditional clinical research data sources.

EMA guidance in recent years has touched on the role of genetics in pharmacovigilance and the importance of being able to identify patient populations with increased sensitivity to medicines. Pharmacogenomics provides great potential for improved understanding of therapeutic response, but the EMA cautions against the poor quality of findings seen in some published studies, highlighting “the importance of correct measurement, determination, interpretation and translation of pharmacogenomic data into clinical treatment”.

But the only way genomic data will impact healthcare is if the industry learns how to manage the sheer volume of data created in genomic sequencing. Appropriate interpretation of genomics findings starts with appropriate genomic methodologies throughout the course of clinical development programmes.

Just as the industry is learning about mobile health data, these tools are only as valuable as the insight that can be drawn from them. Clinical research sponsors need a comprehensive data management process to carry genomic data from the patient to analysis to the EMA’s eyes. The industry is in the earliest stages of understanding the hurdles of integrating genomic data with clinical data.

The arrival of genomics data – not to mention new imaging and mobile health data sources – requires a new approach to data management that builds an architecture for actionable data.

Step one is linking genomic data to the data platform in use for a clinical trial. As with other data sources, once the genomic data is ingested into the data platform, it’s vital to perform data quality checks: were the correct number of samples uploaded? Are the trial sites successfully capturing the data? And so on.

Researchers need visualisation dashboards to interpret genomic data during trials, and clustering algorithms can be applied to identify anomalies much like other data monitoring tools. When these algorithms are layered on traditional clinical data, sponsors can quickly identify changes in adverse events or therapeutic response.

Researchers need genomic data to be standardised for clinical data packages, especially if it is to support global clinical trial programmes. Until recently, integrating genomic data into trial data repositories was a clumsy process.

Companies like Medidata are responding to this need. Medidata recently showcased Clinical Trial Genomics (CTG), which integrates genomics data into one environment, and its applicability for clinical trials at the Annual Meeting of the American Society of Clinical Oncology (ASCO).

CTG automates integration between genomics data and central trial data pools, and then applies machine-learning algorithms for the analytical results needed in a trial.

CTG identifies subject IDs and matches them with existing patient data in the trial database. This is a deceptively difficult task, but once patient IDs are matched within the clinical trial database, researchers have access to an analytical environment that ties together the entire genome, adverse events and clinical endpoints throughout a trial.

It’s vital to maintain this analytical rigor throughout the course of trial – for the scientific integrity of the study and also to help with important and expensive ‘go/no-go’ decisions that are made in the early phases of a clinical study.

From granular data to historical data

Beyond new types of data sourced from genetic sequencing, there are also news ways to incorporate old data sets to inform clinical development decision-making. Early-phase clinical trials are often single-arm studies, whether due to ethical or practical reasons. For difficult diseases with high unmet need, the go/no-go decisions are often made on meagre evidence.

Sponsors supplement single-arm studies with historical data from published papers. If the advent of genomic sequencing allows researchers to look at trials at the cellular level, historical data gives researchers a look at trials from a much grander scale.

But this approach faces several problems. Historical data rarely provides comprehensive metrics of patient characteristics and therapeutic response, and such data may also have unknown statistical bias – forcing drug developers to make critical decisions without the right information.

Organisations like the non-profit TransCelerate and companies like Medidata are building platforms to bring together disparate data sets from trials of similar indications to improve understanding.

In a recent pilot study with Roche Pharmaceutical for a rare cancer indication, Medidata’s Synthetic Control Arm (SCA) pulled actual data from past studies to create well-matched control arms. The pilot was successful enough to be accepted for presentation at the ASCO’s Annual Meeting, the largest academic cancer conference in the world.

Tapping into patient data from more than 3,000 of the trials in Medidata’s database, SCA creates large control groups, precisely matched at the subject level for a trial’s inclusion/exclusion criteria – minimising time, site and study-specific bias. SCA offers the possibility of exploratory subgroup analysis, improves the selection of compounds for next phase and the design of subsequent trials, and can reduce failure rates in phase II and III studies.

In the pilot with Roche, SCA filtered data from Medidata’s repository for patient data that met the Roche trial’s inclusion/exclusion criteria, and did statistical matching so that researchers could apply the data to analysis of the treatment group. The results of SCA’s analysis confirmed Roche’s findings and gave Roche further validation to proceed to a phase III trial.

Data quality above all

With the proliferation of data sources, whether at the single-cell level or historical, it’s vital to be able to efficiently assess the data that comes in.

Almost every trial has material data issues that are left undetected by current methods, according to an assessment of clinical trial data by Medidata’s Centralised Statistical Analysis (CSA) tool, which provides a comprehensive scan of a clinical trial database for inconsistencies across data domains, sites and patients. Data quality issues will continue to rise with the increase in genomics and sensor data.

As the volume and variety of data increases, clinical research will require new approaches to automate monitoring.

Traditional data monitoring methods and tools, which lean towards comprehensiveness rather than efficiency, need to be rethought to meet the new onslaught of data. Traditional source document verification, for example, is already ceding to risk-based monitoring approaches.

With the terabytes and petabytes of data coming from genome sequencing and sensors, the industry will need to lean on machine learning tools to ensure the quality of this data meets regulatory demands, at the EMA and beyond.

Christian Hebenstriet

is general manager and senior vice president EMEA at Mediata Solutions

16th October 2017

From: Research

clinical | Data | genomics | research

Subscribe to our email news alerts

Avoiding data pitfalls in clinical research

Latest content

Latest intelligence

Quick links