Google's window into the healthcare IT market

By Chris Thorman
11:57 AM

One of the major goals of the federal government's push for nationwide electronic medical record adoption is to create an information network where "health data can flow freely, privately, and securely to the places where they are needed." So far, this is proving to be a challenge for the nation's hospitals and doctors.

Software Advice thinks that this problem presents an opportunity for Google to take a big step into the healthcare IT market in 2010, following other major companies like Microsoft, I.B.M. and insurance giant Aetna. Through their Books project, Google has shown that they can scan, interpret and index a high volume of books in a relatively short amount of time. Unstructured medical records – those not neatly organized within an interoperable EMR system – could be managed in the same fashion. Google possesses many of the requisite skills and technologies to solve this problem.

However, to be successful, Google will have to figure out these issues:

    * How to gather structured and unstructured medical data on a large scale;
    * How to share and make that data accessible (searchable) to people; and,
    * How to comply with privacy regulations.

With Google Health rumored to be on the back burner, working with hospitals and medical providers to aggregate and organize medical data could be Google's window into the growing market that is healthcare IT. Here's how they can do it.

The Benefits of Digital, Private & Secure Health Data

The driving force behind the government's $19 billion EMR incentive program is that medical record software truly can transform the United States' healthcare system for the better. EMR advocates have long touted the software's ability to reduce medical errors, improve clinical decision making, empower patients, and reduce the costs of a bloated system.

When medical data is in digital form, it can be sorted, searched and analyzed at a higher rate of efficiency than paper charts. When implemented correctly, EMR software beats paper charts in efficiency, accuracy and cost savings. The problem that Google can possibly fix is the fact that a majority of health data in the U.S., both historical and current, is in paper form.

Structured & Unstructured Data

Medical data comes in essentially two forms: structured and unstructured. Structured data is information that comes in numbers, tables and rows, for example. It's data that is disciplined and predictable. In the medical world, examples of structured data include insurance codes, HL7 standards and other diagnosis codes. Structured data, relative to unstructured data, is easier to aggregate and analyze.

For example, if a user needs to connect two systems operating in two different structured data formats, a "middleware" application is an option. Middleware sits "in the middle" of two different operating systems, allowing them to share information. There are a number of companies in the health IT marketplace today that connect disparate data systems via middleware.

Gathering unstructured data and turning it into a structured format, however, is not so easy. Unstructured medical data includes handwritten notes and charts, and medical images such as x-rays and CT scans. This data can be further categorized as textual unstructured data and non-textual unstructured data, respectively.

Currently, medical transcriptionists and document scanning services use a combination of human review and optical character recognition (OCR) to produce structured data out of unstructured EMR information. This method is expensive and time consuming to say the least.

How To Gather & Store This Data

So, how can Google go about turning unstructured data into structured data on a large scale?

In the case of textual unstructured data, Google's reCAPTCHA program could be the answer to converting it into a structured format. CAPTCHA programs, boxes that ask a user to identify distorted words in order to proceed past a certain point, are becoming ubiquitous on the web as a way to fight spam. Google uses their reCAPTCHA program to translate books, old radio shows and newspaper articles by asking users to identify one word already known by Google and one previously unknown word. The unknown words come from a list of words that OCR programs were unable to translate. If a user gets the known word correctly, for example "overlooks" in the image above, it will assume that what the user types in for "inquiry" will be correct. The unknown word will continue to be shown to other users, to increase reCAPTCHA's confidence that the translation is correct.

If Google is doing this with books and newspapers, why not with handwritten medical charts and notes? The same logic applies – scan and upload individual words from handwritten medical data to a CAPTCHA program, let humans translate them over the web and over time textual unstructured data becomes structured data. Google could theoretically let the 200 million CAPTCHAs filled out each day on the web work towards translating medical records.

Perhaps the most impressive fact about reCAPTCHA is that its accuracy rate is 99.5%, which is the equivalent of a human translation. It's not a stretch of the imagination to envision a system where medical providers can upload their paper documents and have them translated by Internet users.

Finally, Google is well-suited for this project because of the huge amount of digital storage space they have in their 30+ data centers around the world. Hosting this data in the cloud and storing it on super efficient servers means doctors could access a patient's EMR more quickly than if the data was stored locally. We'll touch on privacy issues in just a moment of storing medical data in the "cloud" in a moment.

Making Medical Data Usable

Let's assume that Google can use their reCAPTCHA program to over time translate unstructured medical records, in addition to collecting structured data through specifications such as the Continuity of Care (CCR) and Continuity of Care Document (CCD). How do they make that information easily accessible by humans?

Part of the answer lies in Boston, MA. A team of researchers at Massachusetts General Hospital have created a system that pulls medical data from different sources within the hospital's electronic medical record software and presents it in a logical and user-friendly format. It's called the Queriable Patient Inference Dossier (QPID). Here's how it works:

    While Google's PageRank system works by giving more weight to pages that are linked to more often, EMRs don't have links and therefore cannot employ that approach. Instead, the dossier system has the ability to "learn" certain types of searches from its users, understanding that a search for "squamous cell carcinoma" and another search for "lung cancer" are actually seeking the same information.

The QPID system uses natural language processing (NLP) to "learn" the relationships between words. Sophisticated NLP tools, often associated with artificial intelligence, allow a computer to read and interpret text as if it was human. In short, they use complex statistical models to predict the correct spelling and order of words in a sentence.

Google just recently announced they were ceasing development of their Google Wave project, which uses NLP tools as part of its spell check system. Google's NLP tools are particularly effective because they are developed using data from billions of Google web searches. This makes Google's language and statistical models particularly powerful across a number of languages. Also, in a bit of an odd twist, two Google researchers are set to release a white paper about using Google Wave's protocol to aggregate medical data.

So, if Google work with hospitals and other medical providers to translate handwritten medical documents, combine those with structured medical data, and apply their powerful NLP tools, they could end up with much more robust QPID program than the Massachusetts General team created.

Complying With Privacy Regulations

The Health Insurance Portability and Accountability Act (HIPAA) is the United States' guiding document when it comes to safeguarding personal health information (PHI). The 1996 piece of legislation requires any "covered entity" who manages personal health information to have administrative, technical and physical safeguards against a breach of data. A covered entity is defined as:

    * A health care provider that conducts certain transactions in electronic form;
    * A health care clearinghouse; or,
    * A health plan.

Google Health, the company's personal health record project, allows consumers to add their health information to a digital record online, import prescription information from pharmacies and share that record with their doctor. Currently, Google argues that they're not covered by HIPAA because they're essentially acting as a free online repository, and not transmitting health information electronically themselves.

If Google were to start organizing medical records in the fashion we've described, they would have to conform to HIPAA standards. With dozens of Web-based EMR vendors, who store medical records online, already successfully complying with HIPAA, we don't feel that compliance would present a major issue for Google.

Fulfilling Google's Mission Statement

Gathering the United States' medical data and making it digitally accessible would be perhaps the greatest fulfillment of Google's mission statement – "To organize the world's information and make it universally accessible and useful."

The tools are in place to make it happen. Google has shown they have the will to take on a project of this size. The Google brass will have to decide if the benefits outweigh the costs of a digital healthcare system.

This post originally appeared at Software Advice.  Read more: