A new study from researchers at the Regenstrief Institute and Indiana University found that machine learning models trained using statewide health inf
A new study from researchers at the Regenstrief Institute and Indiana University found that machine learning models trained using statewide health information exchange data can predict a patient’s likelihood of being hospitalized with COVID-19.
The paper, published in the Journal of Medical Internet Research, demonstrates the potential for HIE information to help shape public health decision making.
“It has been quite challenging to bring the bread-and-butter data generated by healthcare systems together with public health decision-making – entities which have long been separate and distinct,” said study senior author Dr. Shaun Grannis, Regenstrief Institute vice president for data and analytics and professor of family medicine at Indiana University School of Medicine, in a statement.
“Our work shows how you can build and employ AI (artificial intelligence) models to securely utilize the clinical information in a health information exchange to support public health needs such as predicting hospital utilization within one week and within six weeks of onset of COVID infection,” Grannis added.
WHY IT MATTERS
As the researchers noted in their study, the COVID-19 pandemic has highlighted the importance of data visibility when it comes to shaping policy decisions – which can, in turn, affect the resources available to health systems.
In addition, broad-scale public health responses should be shaped by population-wide data rather than by organizational analytics.
To address both those needs, the study team used the COVID-19 Research Data Commons, which integrates data from multiple clinical sources – including the Indiana Network for Patient Care, a statewide HIE comprising data from 23 health systems and 93 hospitals.
After excluding certain patients whose only interaction with affiliated health systems was their COVID-19 test result – meaning researchers had no clinical data beyond COVID-19 status – the team included 92,026 individuals representing all of the state’s ZIP codes in their model development efforts.
A total of 18,694 of these patients were hospitalized during the first week of being diagnosed with COVID-19, whereas 22,678 were hospitalized during the first six weeks of receiving a COVID-19 diagnosis.
“Our results demonstrate the ability to train decision models capable of predicting the need of COVID-19-related hospitalization across a broad, statewide patient population with considerable performance accuracy,” said the researchers in the study.
They noted that the model was particularly accurate for predicting one-week hospitalization and for identifying the patients who were not in need of care.
Patient age, chronic obstructive pulmonary disease status, smoking, diabetes, indication of neurological diseases, mental disorders, residence type (meaning urban versus rural) and income-level all influenced the prediction.
“Such utilization prediction models may be used for population health management programs in health systems, to identify high-risk populations to monitor or screen, as well as predicting resource needs in crisis situations, such as future spikes in pandemic activity or outbreaks,” read the study.
The team also noted some biases evident in the model that require further efforts to identify their root causes. Namely, being male or living in an urban area was associated with stronger predictive performance.
“These differences may be influenced by variations in access to healthcare services or healthcare delivery prevalent in the datasets, and the models could learn them during the training process,” they noted. “We cannot make further assumptions on the causes of varying model predictions without a proper assessment of underlying causes of this behavior.”
THE LARGER TREND
Given the strain on hospital resources the pandemic has caused, many informaticists have focused on the ability to try and predict patient populations.
For instance, a group of Israeli scientists in early 2021 used an ML model to predict the illness trajectory of COVID-19 patients by using individual characteristics, and researchers in July of that year used the largest data repository of COVID-19 patients in the United States to develop a model predicting clinical severity based on first-day admission data.
And from a more geographically focused lens, Kaiser Permanente researchers in July 2021 used electronic health record information to put forth a method to predict upcoming COVID-19 surges up to six weeks in advance.
ON THE RECORD
“Since the onset of COVID-19, researchers, healthcare systems, public health departments and others have leveraged existing data repositories and health information infrastructure for rapid analytics,” said Suranga Kasturi, a Regenstrief Institute research scientist and an assistant professor of pediatrics at IU School of Medicine, in a statement. “Machine learning has been invaluable in these efforts.”
“But any model is only as good as the data that goes into it,” continued Kasturi, the first author on the study. “The broad, robust data from the Indiana Network for Patient Care is representative of the U.S. population. What we have done could be characterized as a precursor of how AI tools can be deployed across the entire country with the important caveat that whatever models are used should be evaluated for fairness across all subpopulations.”