Unstructured EHR data more useful for predictive analytics, study shows
A new report in the Journal of the American Medical Informatics Association has shown that real-world data contained in unstructured narratives has big predictive value when it comes to clinical research.
WHY IT MATTERS
While structured clinical notes in the electronic health record have obvious value, the research in JAMIA suggests that real-world data captured in unstructured notes offers more accuracy when trained algorithms are used to mine it.
While the challenges of making good use of unstructured data have been well-documented. And indeed, researchers in this case depended on artificial intelligence technology from Verantos (whose founder, Stanford professor Dr. Dan Riskin, was an investigator on the study) to mine it for insights. The details contained in these EHR narratives, with their real-world insights into patient history, conditions, procedures and more, were more useful in predicting coronary artery disease.
"With growing availability of digital health data and technology, health-related studies are increasingly augmented or implemented using real world data," wrote the researchers, led by Tina Hernandez-Boussard, associate professor of biomedical informatics, data science and surgery at Stanford University School of Medicine.
"Recent federal initiatives promote the use of RWD to make clinical assertions that influence regulatory decision-making," the researchers said. "Our objective was to determine whether traditional real world evidence techniques in cardiovascular medicine achieve accuracy sufficient for credible clinical assertions, also known as 'regulatory-grade' RWE."
For the retrospective observational study, which used six years' worth of deidentified EHR data, a specified set of clinical concepts was mined from both structured (using standard query techniques) and unstructured EHR data (using AI).
"The dataset included 10,840 clinical notes," researchers explained. "Individual concept occurrence ranged from 194 for coronary artery bypass graft to 4502 for diabetes mellitus."
Granular insights such as those helped the real-world evidence in the narrative notes correspond to more accurate predictive modeling, they found.
With structured EHR data, or EHR-S, "average recall and precision were 51.7% and 98.3%, respectively," according to the report. For unstructured data (EHR-U) those numbers were 95.5% and 95.3%.
Researchers concluded from the research that, "overall, EHR-S did not meet regulatory grade criteria, while EHR-U did. These results suggest that recall should be routinely measured in EHR-based studies intended for regulatory use. Furthermore, advanced data and technologies may be required to achieve regulatory grade results."
THE LARGER TREND
Unstructured data has long posed hurdles for health system analytics initiatives. But the insights contained in those clinical notes have big value for population health and value-based care efforts.
Now that AI tools are sufficiently mature and widespread to extract some of that value, more and more researchers will be making use of real-world evidence, and both federal agencies and technology developers are honing their efforts to ensure providers and life sciences organizations can capitalize on those insights.
ON THE RECORD
"The goal of this study was to perform a rigorous quality assessment of RWD to understand the potential and limitations of RWE in regulatory decision-making," said researchers in JAMIA.
"In summary, we document differences in obtained accuracy between EHR structured and unstructured data for clinical phenotyping in cardiovascular medicine," they said in the report's conclusion. "The clear learning from this study is that accuracy is heavily influenced by data and technology choices."
They added that "pharma, academia and vendors must not shy away from the hard work required to ensure data accuracy. As payers and regulatory agencies move forward with real world evidence to overcome cost and generalizability issues, understanding the benefits and limitations of different data and technologies is essential."