i2b2 open source software boosts HIE, biomedical research

By Anthony Brino
01:37 PM

The health informatics software i2b2 — Informatics for Integrating Biology and the Bedside — was started in 2006, and has become something of a building block for several health information networks and research projects in genomics, pharmaceuticals and population health.

Developed at the Partners HealthCare System as a federally-funded biomedical computing center, the open source software is letting biomedical researchers combine genomic and molecular research with data and observations from electronic health records, and its code set is also being used to link with claims databases and health information exchanges.

More than 80 academic health centers in the U.S. have adopted i2b2, and it’s also being used in an international pharmaceutical research consortium called tranSMART.

“We did not anticipate that the software would go viral,” said i2b2 co-creator Isaac Kohane, MD, a Harvard Medical School professor. While Kohane thinks not enough health systems using i2b2 have used it to measure quality (Cincinnati Children’s Hospital is doing that), he thinks it’s fostered a blend of research “both for science and quality improvements.”

Initially built using Java and natural language processing, i2b2 partly developed as a way to share information between Harvard doctors and affiliated centers across five Massachusetts hospitals. Funded with $20 million from the National Institutes of Health, along with other research centers that ended up using the code sets, i2b2 led to the development of plug-ins and spin-off platforms like the web-based query and data sharing network called SHRINE, which the five University of California medical centers used to create a federated data warehouse.

“I have always been troubled by the lack of available data,” said Kohane, who also has a background in computer engineering and is co-leading a comparative effectiveness study at Partners with the insurer Aetna. Looking at the legacy health IT systems available and the new ones emerging back in the early 2000s, Kohane said he designed i2b2 with some of the same software query ideas used in the Partners’ and Boston Children’s IT systems first developed in the 1980s and 1990s.

[Listen: Podcast: Pharma-back tranSMART project, and fusing molecular biology with clinical data]

The i2b2 software has been compatible with many vendor’s EHR systems, Kohane said, and that’s helped meet a long-time goal of biomedical researchers to link “the genotype to the phenotype,” using genetics and molecular biology to find biomarkers for certain diseases, or understand variating responses to treatments. Kohane said doctors notes — analyzed by natural language processing — have turned out to yield pretty detailed patient information and somewhat more than claims data and billing codes. “We’re able to achieve a very high specificity of phenotyping,” he said.

Kohane is doing ongoing research of autism, and with i2b2 linking data from about a dozen other institutions, he’s been able to compile samples of around 10,000 subjects, compared to previous studies involving only hundreds. “We’re seeing effects that were previously thought to be anecdotal,” he said. Until recently, “it wasn’t known that up to five percent of kids with muscular dystrophy have autism,” and finding that’s starting to show how proteins associated with muscular dystrophy are expressed in the brain and as symptoms of autism.

One large research area that’s starting to blossom, Kohane said, is pharmaco-epigenetics, studying the molecular basis of variations in drug responses. i2b2 also lets healthcare systems track the effects of pharmaceutical therapies and find unexpected adverse (or beneficial) effects, “before things even rise up to the level of notice by the Food and Drug Administration.”

It’s also helps bring the costs of large scale genomic studies way down, he said. The potential insights gleaned from whole genome sequencing, for researchers and also patients, are currently offset by their cost, and the potential that much of the data will be largely unmeaningful. For pharmaceutical research, “You need thousands or even hundreds of thousands of subjects to be able to detect rare events reliably, and to be able to measure the weak effects of genetics and their common variants across populations.”

[See also: NIH showcases informatics researchers as new open source ventures launch]

As the global pharmaceutical industry faces changes and challenges, i2b2 has been built into tranSMART, an open source platform available under the GPLv3 license, offering a data repository of demographics, clinical observations, clinical trial outcomes, and adverse events, biomarker data like gene expression and pharmaco-dynamics markers. tranSMART is being used by a fairly large international public-private consortium founded by Johnson & Johnson and Recombinant Data, and software engineers at smaller organizations have also been able to use i2b2 for their specific research needs.

With HITECH Act funding, the HIE HealthShare Montana is using i2b2 as a basis for a statewide clinical repository, in partnership with University of Washington translational health and informatics researchers, Recombinant Data and Covisint.

HealthShare Montana software engineer Jeff Green and project manager Cleary Waldren, used i2b2 to design analytic software that William Reiter, MD, an internist and chief medical information officer at HealthShare Montana, hopes to use for comparative effectiveness treatment.

With patient data starting to come in this month, HealthShare Montana has gradually been building operations and linking to providers, while testing use cases with dummy data. As health reform brings more integration and quality performance critiques to healthcare, for instance with accountable care organizations, Reiter thinks the analytic potential of health IT is getting physicians much more interested in adopting EHRs and joining HIEs.

“When we have presentations around the state,” Reiter said, ”the first part is always about the HIE, and everyone in the audience is half-asleep. They’re fast asleep when you drone on about Meaningful Use and the criteria that they have to meet, because they consider that more of an administrative rather than a physician thing. When we start talking about analytics, and when we pull up i2b2 and the docs see what they can do with it and how they can interrogate their own data, it’s almost like literally the audience goes wild.”