For artificial intelligence (AI) to realize its full potential to benefit cancer patients, researchers will have to prove that their machine-learning successes can be consistently reproduced across settings and patient populations.
That’s why Case Western Reserve biomedical engineering researchers are increasingly focused on applying their novel algorithms to patient scans from multiple locations.
Earlier this spring, for example, they published promising findings involving lung cancer diagnosis among 400 patients from three health care systems. And a 2020 study showed that their approach could predict recurrence in 610 early-stage lung cancer patients across four sites.
“This is no small thing—this is an important next step in making AI useable for clinicians someday, and it’s one of things we have to address head on,” explained Anant Madabhushi, director of the university’s Center for Computational Imaging and Personalized Diagnostics (CCIPD) said. “For instance, we know that even within a single hospital, one could have patients scanned on different CT scanners, resulting in images with differing appearance, so the AI has to be able to account for these differences.”
So if AI is ever going to be trusted—and then routinely used—by physicians and clinicians, Madabhushi said, those end users must be convinced not only that computer diagnosis is possible, but that it can be reproduced—and specifically work for their own patients.
Next steps: re-proving reproducible results
Researchers call this reproducibility or often “generalizability,” the idea that a successful method, treatment or tool can work no matter when, where, or on whom—or in the face of virtually any other variable.
It has proven an elusive goal and has even called a “myth” by other researchers, who have identified several daunting hurdles. Those difficulties include differences in how CT machines produce images, variations in hardware and software and patient demographics.
To that end, Madabhushi and his group are planning prospective clinical trials using the generalized AI signatures for lung cancer on CT scans that they have already identified.
The researchers have been working with hospitals in Northeast Ohio to assess the real-world generalizability of these AI tools for problems relating to diagnosis and prognosis of lung cancers.
Now, new published research builds on previous and ongoing work within CCIPD over the last few years in the area of developing generalizable AI models.
What’s new is the creation of a more formal framework for identifying stable and accurate features, while also validating the approach on much larger numbers of studies and institutions.
The research by Madabhushi, biomedical engineering Ph.D. student Mohammadhadi Khorrami and collaborators appeared, respectively, in April 2020 in the journal Lung Cancer, and in March 2021 in the European Journal of Cancer.
The difference: ‘stable’ features
To this point, Anant Madabhushi and his group at the university’s Center for Computational Imaging and Personalized Diagnostics (CCIPD) have successfully applied their AI determining which lung cancer patients would respond well to chemotherapy, immunotherapy or, in some cases, whether cancer would return or how long a patient might live.
But in each case, those outcomes have only come from analysis of existing data and/or images after the fact and only for a single group of cancer patients.
Now, instead of just teaching their computers to focus on features in the scans that differentiate between malignant and benign tumors, for example, they programmed the AI to also remember lesser features which are consistent from one scan to another—even if those features were unrelated to the cancer itself.
Key to this work was the evaluation of hundreds of image features by biomedical engineering Ph.D. student Mohammadhadi Khorrami, Madabhushi said.
Khorrami considered not only how texture and shape of lung nodules could lead to a diagnosis of lung cancer and predict outcomes, but also how consistent, or stable, these features were across CT scanners and sites.
“To do this, we identified a set of features that were most accurate—but at the same time stable across sites,” Khorrami said. “So, when we evaluated the machine learning models with these accurate and stable features on external sites, these models did better when compared to ones created with only the most accurate features, the ones where the stability of features was not considered.”
Case Western Reserve University