Cancer data "computationally intensive"

Health IT not keeping pace with big data

By Diana Manos
09:37 AM

In the realm of healthcare big data, one major barrier still remains to making sense from the mountains of information that already exist: the information technology just isn't keeping pace.  

Call it super-mega big data. Taking just one example, cancer research, highlights how far the healthcare industry has yet to go to turn this information into value.

The opportunity in this case is precision management of human cancers. Indeed, the ability to study cancerous tumors on the genetic level and formulate trends on how to treat and cure the disease are within grasp, said Joe Gray, associate director for translational research at the Knight Cancer Institute at the Oregon Health & Science University.

[See also: MU creates 'medical bridges to nowhere' and Big data sets sights on heart disease.]

Genomic cancer study today is exciting because it offers the ability to optimize treatment for an individual, Gray said at the Health Care Innovation Summit in Washington last week. Once the genetic features of a cancer are known, doctors could scan a cancer patient's whole body for the genetic material of that cancer, finding where it might have spread, or is in the process of spreading. And in theory, researcher and doctors could then find what treatment is best suited to the patient and how treatment could be best combined to get durable responses.

The fact IT hasn’t kept pace with the data leads to other arduous plot folds, such as power, security and standards, Gray said. The ability to genetically track cancer produces “raw data that is huge in scope.”

How huge? Try more than 100 GBs for a single tumor. Applied to almost 2 million U.S. cancer patients per year and 14 million cancer survivors, that totals tens to hundreds of pedabytes of data per year, Gray said, adding, "this data is computationally intensive."

Gray, who has been working on curing and treating cancer for 40 years, is also principal investigator for the National Cancer Institute Integrative Cancer Biology Program Center for Cancer Systems Biology.

[See also: Big data to assess CMS quality measures.]

Cancer research has come a long way, he said. Forty years ago, researchers and doctors often knew nothing about many types of cancers. Today, they have the ability to break individual cancers down on a genomic basis. Every tumor has its own unique genetic signature. No tumor is like any other.

All of which makes for enormous complexity and heterogeneity in human cancers, sometimes with tens of thousands of abnormalities per tumor. In addition, tumors are heterogeneous per patient at the molecular level. One part of tumor might be different than another part in the molecular domain, Gray explained. So it can literally take computers a week to analyze one tumor. In addition, there is no consensus yet among researchers on what constitutes a key feature.

"In order to figure this out, one thing we’re going to have to be able to do is establish associations between features and outcomes that we care about," Gray said. "This is going to require doing association studies on millions of tumors to have the statistical signatures to drive the treatment of cancer."

Another key problem is comparing the research data with that gathered by physicians who are treating the cancers; the data sets are too big to move. It would take years over the fastest Internet service to transport this data. "We are going to have to bring the analytics to the data instead of other way around," Gray said.

After all that is solved, there will still be an energy problem, as well, Gray said. Studying the data, in fact, will demand megawatts.

And access to such information is going to require a high level of security. Genomics data is sufficiently rich that patient identities and that of a patient’s family can be discovered within the data, if someone were bent on doing that, which Gray explained has given him pause about putting that information into a public cloud.

"We don’t have the answers to this right now," Gray said, and "many patients don't have time to wait."

Now is the time, however, and Gray envisions the need for sufficiently large clinical and genomic data sets that can be recursively analyzed to glean associations that at least have potential to guide precision medicine and cancer care moving forward.

"The opportunity is there. It is going to require standardization in ways that we are not now doing," Gray said. "But the promise is sufficiently large that we can get there."