IBM offers tips for doing big data right
ARMONK, NY – A quick look at the headlines lately shows that "big data" is a big deal. Healthcare is just starting to realize the potential of gathering, drilling down, mining and analyzing those massive troves of information – and more and more signs point to big data analytics making a big difference.
Researchers see the potential for advancements that could lead to major improvements in public health. Vendors see the potential for new income from a new market.
But as with any term that gets tossed about this much, the question has to be asked could big data just be another buzzword? IBM doesn't think so.
The computing giant made news on the big data front more than once recently, first with its April 25 acquisition of Pittsburgh-based Vivisimo, and the next day with an announcement from SUNY Buffalo about multiple sclerosis research.
Vivisimo develops federated discovery and navigation software meant to help organizations access and analyze big data. With some 2.5 quintillion bytes of data created every day, IBM says the deal will help accelerate its big data analytics initiatives, helping organizations such as healthcare providers, government agencies and telecommunications companies navigate and analyze structured and unstructured data.
"Navigating big data to uncover the right information is a key challenge for all industries," says Arvind Krishna, general manager, information management, IBM Software Group. "The winners in the era of big data will be those who unlock their information assets to drive innovation, make real-time decisions, and gain actionable insights to be more competitive."
So, is big data as transformative as so many say it is?
It's a fair question, says Shawn Dolley, vice president and general manager of global healthcare and life science at IBM Big Data. (He worked for Marlborough, Mass.-based Netezza before its acquisition by IBM in 2010.) "From a healthcare perspective, the typical question I always ask is, 'Is this just a rebranding of business intelligence, now that data sizes have grown?’"
Doing it right depends how it's deployed, he says. "We talk to a lot of health systems, and the ones that we think are utilizing it well have a few things they typically do."
Most crucially, "they are [making] strides toward trying to create a biologically oriented, centralized, linked repository," says Dolley. "Some have a very clinical, focused approach: 'Let's do some gene sequencing near the bedside and optimize our cancer drug cocktail.' That's sort of proactive, tactical, specific, high-ROI to a single patient."
Dolley offered some advice on making optimal use of big datasets.
"I would say, first, to accommodate architectures and toolsets that – at the low level in the technology stack, at the platform level – may be different, may be different than what you used before. Someone doing financial analysis doesn't need to look at a fundamentally different piece of technology. But you are going to need to do that if you're going to put together one of these multidimensional biological repositories together.
"Second, become more open to the open source technologies. That's what the researchers are trying to attract to your institution today, they're not going to want to use SaaS. They're going to be more conducive to some of the more open source because that's just what they're growing up with.
"Third, be ready to store the data," says Dolley. "SUNY has one approach. They pull in 18 terabytes, and they do their MS study, and then they dump it. I understand that. It's easier to get a new sample of blood, and why store things over periods of time. But I saw this 15 years ago in the BI realm, where people would say, 'Well, I only need to store this data for a year.' Sure enough, a case study or some other reason comes up for them wanting to store it for a longer period of time."
Dolley refers to State University of New York (SUNY) at Buffalo, which announced on April 26 some results from researchers' use of IBM analytics technology to study more than 2,000 genetic and environmental factors that may contribute to multiple sclerosis (MS) symptoms.
With the initiative, scientists use analytics technology to develop algorithms for big data containing genomic datasets to uncover critical factors that speed up disease progression in MS patients. Insights gained from that research will eventually be shared with physicians to help them tailor individual treatments to slow brain injury, physical disability and cognitive impairments caused by MS. Affecting approximately 400,000 people in the United States and some 2.1 million people worldwide, MS is a chronic neurological disease for which there is no cure, and believed to be caused by a combination of genetic, environmental, infectious and autoimmune factors making treatment difficult.
SUNY Buffalo researchers will explore clinical and patient data to find hidden trends among MS patients by looking at factors such as gender, geography, ethnicity, diet, exercise, sun exposure and living and working conditions. All those data points – including medical records, lab results, MRI scans and patient surveys – arrive in various formats and sizes, requiring researchers to spend days making it manageable before they can analyze it.
"They weren't going into this saying, 'We have a very specific study, where we have a deep belief that urban allergens have to do with multiple sclerosis so we think a certain string of genes can be compared to something else," says Dolley. "Their approach was, 'Let's do something that has combinatorial explosion, let's put every phenotype variable we can, let's put the subject genes in there, and let's run every possible combination."
Dolley remembers when Harvard Medical School researchers told him not long ago that, when it came to solving problems with big data analytics, "We have four questions we can ask per month." Why four? "Well, there's four weekends per month," he says. "That's how long it takes us to run our jobs."
Now, using an IBM Netezza analytics appliance, in conjunction with software from Revolution Analytics, researchers can analyze disparate data in a matter of minutes instead of days, regardless of what type or size it is, say IBM officials. The technology automatically consumes and analyzes the data, and makes the results available for further analysis, leaving researchers time to analyze trends instead of managing data.
Other institutions have been availing themselves of big data analytics from IBM recently, as well. At the University of Ontario Institute of Technology neonatal intensive care specialists monitor a constant stream of biomedical data, such as heart rate and respiration, in efforts to spot potentially fatal infections in premature babies. At Harvard Medical School/Brigham and Women's Hospital, the technology is key to new research scouring patient records and insurance claims data to examine the effectiveness of prescription drugs and spot potential safety issues. Marshfield Clinic uses IBM's platform for real-time analysis of some 97 million patient diagnoses and lab results dating back to 1960, for clinical decision support, with analysis time cut from hours to seconds.
At SUNY Buffalo, the hope is that insights gleaned from this big data analysis can be applied to diseases such as MS and significantly change the way patients receive treatment.
“Multiple Sclerosis is a debilitating and complex disease whose cause is unknown," says Murali Ramanathan, lead researcher at SUNY Buffalo. "No two people share the exact same symptoms, and individual symptoms can worsen unexpectedly. Identifying common trends across massive amounts of MS data is a monumental task that is much like trying to shoot a speeding bullet out of the sky with another bullet.
"IBM analytics helps our researchers fine tune their aim and match the speed of analysis with the rate of data coming into our systems," he adds. "Our goal is to demystify why the disease progresses more rapidly in some patients and get those insights back to other researchers, so they can find new treatments.”