Clinical Trials Have Too Much Data…That’s the Problem.

MedTech Intelligence – Read More

Here is a finding worth sitting with: a 2025 Tufts Center for the Study of Drug Development study found each Phase III protocol now gathers an average of 5.9 million datapoints, up 67% since 2020 and nearly triple the volume recorded a decade ago, with individual patients subjected to an average of 263 procedures supporting approximately 20 endpoints. This explosion in data is driven by wearables, electronic clinical outcome assessment (eCOA) platforms, electronic health records (EHRs), glucometers and passive sensor arrays filling a pipeline of data that never closes.

Smartwatches now record cardiac rhythms around the clock. Connected glucometers transmit readings the moment they’re captured. eCOA platforms can push a precisely calibrated survey to a patient’s phone within minutes of a symptomatic event, capturing their experience at the exact moment they’re living it.

By almost every measure, the historic ambition of capturing rich, diverse and continuous patient data has been achieved. The hard problem — getting enough of the right data from enough of the right patients — is no longer the hard problem.

And yet, sponsors are still making critical decisions with incomplete pictures of what’s actually happening to their patients. How can this be the case?

The simple truth is that the industry overlooked the infrastructure problem underneath data collection: none of these systems were designed to talk to each other. The data doesn’t converge. Insight doesn’t automatically emerge. What sponsors have built, in many cases, is an extraordinarily sophisticated set of silos, each one capturing something real but none of them capturing the whole picture.

That problem is now impossible to ignore, and 2026 is shaping up to be the year the industry finally stops treating integration as an IT concern and starts treating it as a scientific one.

The silo tax

Think about what disconnected data actually costs in practice. A diabetic patient enrolled in a decentralized trial is wearing a continuous glucose monitor, logging symptoms in an eCOA app and having their medication adherence tracked passively through a connected pill bottle. Each of those streams is capturing something real and valuable. Operating in isolation, they don’t deliver what the clinician really needs to know.

But connect them — so that an out-of-range glucose reading automatically triggers a targeted eCOA survey within minutes — and suddenly those isolated data points become a clinical signal. The patient’s subjective experience and objective physiology, time-stamped and co-located within the same dataset, become something a biostatistician can use to understand disease progression, treatment response and quality of life in ways that neither stream could support alone.

The difference between those two scenarios isn’t more data. It’s integration. And most trials today are still paying the silo tax: redundant data entry across disconnected platforms, endpoints that can’t be cross-referenced and insights that arrive months after the moment they could have changed a decision. The cost isn’t just operational inefficiency; it’s potentially adverse patient outcomes as well as lost scientific opportunities. Signals that should have accelerated a go/no-go decision get buried in disconnected spreadsheets instead.

Fixing this requires more than selecting better software. It requires a deliberate data architecture strategy that starts with a scientific question — What are we really trying to learn about this patient? — and works backward to determine which data streams need to be connected, how and at what frequency. Technology should serve that strategy, not dictate it.

Standards are the foundation nobody wants to talk about

Before any integration strategy can succeed, there’s a foundational issue that the industry has been reluctant to confront directly: data standards fragmentation. EHR systems, wearable device vendors, eCOA platforms and laboratory information systems often speak entirely different data languages. Getting them to communicate requires either expensive custom middleware or — the better answer — adoption of common standards like HL7 FHIR and CDISC across the vendor ecosystem.

HL7 FHIR — FHIR® – Fast Healthcare Interoperability Resources

Progress is being made, but it’s uneven. Some vendors have committed to FHIR-compliant APIs. Others treat proprietary data formats as a competitive moat. Sponsors who want true data convergence must make interoperability a procurement requirement, not an afterthought, and they need to ask vendors hard questions about their standards roadmaps before signing contracts. The trials that will achieve genuine integration in the next two to three years are the ones whose sponsors are having those conversations today.

AI isn’t the story — infrastructure is

A great deal of attention has been paid to artificial intelligence (AI) in clinical research. Most of it focuses on the wrong applications. The transformative use of AI in trials right now isn’t novel drug target discovery or predictive patient stratification, though those applications have real potential. It’s the elimination of manual processes that have always been expensive, slow and error prone.

Consider how clinical outcome assessment scoring used to work. COA data was entered into spreadsheets by trial technicians, routed to biostatisticians and scored weeks or months later using sponsor-developed algorithms that may or may not have been applied correctly. The failure modes were numerous: wrong algorithms, transcription mistakes, inconsistent scoring rules applied across sites, and analysts unfamiliar with specific instrument requirements. Errors from this process often weren’t caught until database lock, when fixing them was a serious regulatory and operational problem.

Automated scoring built directly into COA software eliminates that entire failure chain. Regulatory-grade algorithms execute at the point of data capture. Discrepancies that would have taken months to surface are caught immediately or never introduced at all. Sponsors don’t need to develop and validate their own scoring algorithms; the work is done within the platform, consistently, across every site and every patient. That’s not a technology novelty. That’s a fundamental shift in where scoring errors come from, which is to say, they largely stop coming from anywhere.

The same logic applies to longitudinal analytics and natural language processing tools applied to clinical data review. These technologies aren’t valuable primarily because they’re sophisticated, but rather they collapse the time between data collection and actionable insight, and because they free clinical staff to focus on the interpretive judgments that require human expertise, rather than the mechanical work of data processing that doesn’t.

Closed-loop automation and adaptive trial management tools push this further. Real-time visibility into accumulating endpoint data allows sponsors to make protocol adjustments — dose modifications, site resourcing decisions, patient selection refinements — while the trial is still running, rather than discovering problems at an interim analysis. The trial becomes a dynamic system rather than a static one, and the science benefits accordingly.

Patients didn’t sign up to operate technology

There’s a useful test for any technology deployed to trial participants: Does the patient notice it?

If they do, something has probably gone wrong. The best patient-facing technology in clinical research is functionally invisible — it integrates with the smartphone already in the patient’s pocket, operates passively in the background, surfaces only when it has something specific and relevant to ask and never creates unnecessary friction in the participant’s daily life. Anything more burdensome than that is a retention risk, and retention risk is one of the most expensive problems a trial can have.

This may sound simple, but it isn’t. Building genuinely unobtrusive technology requires serious investment in user experience design across widely varying demographic groups, not just a polished app that works well for a 35-year-old who is comfortable with technology. Pediatric populations respond to gamification elements that make participation feel rewarding instead of clinical. Elderly participants need simplified interfaces that don’t require a learning curve. Patients from underserved communities may have different device access, different connectivity constraints and different relationships with medical research institutions that need to be understood and respected in the study design.

When sponsors get this right, the benefits compound. Patients who understand how their participation contributes to something larger — who can see, through engagement portals and transparent communication, that their data is being used meaningfully — stay enrolled longer. They complete assessments more consistently. They report symptoms more openly. Patient-centric design isn’t a compliance checkbox. It’s a data quality strategy.

The regulatory gap is real, and sponsors need to close it themselves

Current FDA guidance on the use of digital evidence in pivotal trials was not written with continuous physiological monitoring, passive behavioral data streams or multimodal sensor fusion in mind. That gap creates genuine uncertainty for sponsors who want to use these technologies in support of primary endpoints, and the gap won’t close on its own or on a predictable schedule.

Sponsors who want regulatory acceptance of novel digital endpoints need to generate the scientific validation data themselves, and they need to start that work well before it becomes urgent. Waiting until a regulator asks a question about a specific endpoint is too late. The evidence base — demonstrating that the digital measure captures what it claims to capture, that it does so reliably across patient populations, and that it correlates appropriately with established clinical outcome measures — needs to exist before the pivotal conversation begins.

Organizations like the Critical Path Institute are doing the sustained, collaborative work of building that evidence base by convening sponsors, technology developers and regulatory scientists around real data rather than theoretical frameworks. Sponsors who participate in those forums and contribute their own validation data to the shared pool are both accelerating the process and positioning themselves to benefit from it first when guidance is updated. This is pre-competitive collaboration in the most literal sense: the science that benefits one sponsor’s program eventually benefits the entire field.

The strategy is the hard part

Technology can surface a clinical signal. It cannot decide what the signal means, whether the trial design is capturing the right endpoints or whether the scientific question being asked is the most important one to ask. Those remain human problems, and they require a different kind of rigor than the technology selection process that has occupied so much of the industry’s attention over the past decade.

The question that matters most in 2026 isn’t which platform integrates the most data streams or which AI vendor has the most impressive demo. It’s whether the scientific strategy is sophisticated enough to use what integration makes possible. What outcomes matter most to patients living with this disease, not just to regulators approving a label? Which digital endpoints can be validated against established clinical measures and on what timeline? How does the data architecture serve the hypothesis, rather than the other way around?

Sponsors who approach these questions seriously — who build their integration strategy from the scientific question outward, who invest in the patient experience as a data quality issue, who engage with regulators early and contribute to the shared evidence base — will run better trials. Not just more efficient trials, but scientifically stronger ones, with cleaner data, more complete patient pictures and insights that arrive in time to matter.

Clinical trials in 2026 don’t have a data problem. They have a strategy problem. The data is already there. The tools to integrate it exist and are proven. What’s missing, in many programs, is the clarity about what to do once the complete picture finally comes into focus — and the organizational commitment to build a strategy worthy of the data that’s already being collected.

The post Clinical Trials Have Too Much Data…That’s the Problem. appeared first on MedTech Intelligence.