IBM Watson accurately matches oncologists' advice, study finds
IBM Watson for Oncology was mostly in agreement with recommendations from a panel of oncologists in a double-blinded validation study, the results of which were presented at the 2016 San Antonio Breast Cancer Symposium this week.
Big Blue developed the artificial intelligence computer Watson for Oncology in collaboration with Memorial Sloan Kettering Cancer Center to extract and assess large amounts of structured and unstructured data from medical records using natural language processing and machine-learning.
Watson provides oncologists with treatment recommendations for breast, lung, and colorectal cancers at our institution, said S.P. Somashekhar, MD, an oncologist and the study’s author and chairman of the Manipal Comprehensive Cancer Center, Manipal Hospitals, in Bengaluru, India.
“We wanted to know more about how it would impact oncologists’ day-to-day practice and to assess how Watson’s recommendations compared to the decisions of our team of experts,” he added.
In order to assess agreement between Watson for Oncology and Manipal’s multidisciplinary tumor board – a group of 12 to 15 oncologists who meet weekly to review cases at the hospital – Somashekhar and colleagues studied the cases of 638 breast cancer patients who had been treated at Manipal Hospitals.
Watson’s recommendations came in three categories: recommended standard treatment; for consideration; and not recommended.
Ninety percent of Watson Oncology’s recommended standard treatment and for consideration recommendations were in accord with those of the tumor board.
In a separate, retrospective analysis, the degree of concordance was 73 percent overall, but varied depending on the type of breast cancer.
Somashekhar said Watson for Oncology recommendations were in agreement nearly 80 percent of the time in non-metastatic disease, but only 45 percent of the time in metastatic cases. In cases of triple-negative breast cancer, Watson for Oncology agreed with the physicians 68 percent of the time, but in HER2/neu-negative cases, Watson recommendations matched the physicians’ recommendations only 35 percent of the time.
Somashekhar said the difference in accord was not surprising given that triple-negative breast cancer has fewer treatment options than HER2/neu-negative breast cancer.
“Including HER2/neu cases opens up many more treatments and variables for consideration,” he explained. “This increases the demands on human thinking capacity. More complicated cases lead to more divergent opinions on the recommended treatment.”
The study also compared how long it took to capture and analyze data to generate recommendations. It took an average of 20 minutes when done manually, but after gaining more familiarity with the cases, the time decreased to about 12 minutes. Watson took a median time of 40 seconds to capture and analyze data and give a treatment recommendation.
Somashekhar cautioned that while artificial intelligence is a step toward personalized medicine, it should not be viewed as a replacement for a physician, but rather as a complement.