I saw another exciting news story on a mobile health intervention the other day. I honestly don’t remember the company or product, but what stuck with me was the declaration of success based on 10 patients using the product for three months. Success was touted in terms of cost reduction and resource utilization reduction in a before/after analysis. This inspired me to collect some thoughts on some of the challenges around evaluating success in mHealth.
mHealth represents the collision of two interesting worlds — mobile, which changes on what seems to be a daily basis, and health care, which changes infrequently, only after significant deliberation and usually much empirical analysis. In the tech (mobile) world, companies are talking about creating a minimally viable product (MVP), getting it out in the market, assessing adoption through metrics such as downloads and customer feedback, and iterating accordingly. This would seem to make sense in the consumer world where the goal is to sell a game, an information app or productivity app. If people use it and are willing to pay, that proves its utility, right?
There is something to this line of thinking. Empiric market success is in some ways the ultimate success, at least for those who want to make a big difference in how humanity benefits from technology.
But does this work in healthcare? I’m not so sure. As clinicians, we’re trained to turn our noses up at this sort of measure of success. But maybe we’re the ones who are wrong. Let me use the 10-patients-for-three-months example to illustrate some issues.
- Selection bias. Virtually all pilots and trials of any sort suffer from this to some extent. These days, it seems that patient/consumer engagement is the holy grail and we all must realize that people who show up to enroll in any sort of study are already engaged to an extent. What about the people who are great candidates for an intervention (conventional wisdom says the disengaged are sicker and more costly) but are too unmotivated even to show up to enroll? Does anyone know how to handle this one?
- Regression to the mean. This is a pesky and annoying one — and a favorite of folks trained in public health — but unfortunately it is a real phenomenon. This is the stake in the heart of virtually all before/after studies. If you follow a group of people, particularly sick ones, a certain percentage of them will get better over time no matter what you do. The more sick the starting sample, the more dramatic the effect. This is why some sort of comparison group is so helpful and why before/after studies are weak.
- Small sample size bias. This one can go either way, meaning you can exaggerate an effect or miss one. If you want to run a proper study, find someone who has training in clinical trial design to estimate the size of the effect of your intervention, and thus the size of the sample you need, to show its efficacy. Lots of technical jargon here (power calculations, type I error, type II error, etc.), enough to make your head spin. But bottom line, you can’t really say much about the generalizability of data based on 10 patients.
- Novelty effect. I made that up, and there is probably a more acceptable scientific term for it. But what I’m referring to is, when you take that same group of people that was motivated enough to enroll in a study and apply an intervention to them, the newness will drive adoption for a while. We see this all of the time in our studies at the Center for Connected Health. The novelty always wears off over time. In fact, I’d say the state-of-the-art in understanding the impact of connected health is one of cautious optimism because we haven’t yet done long term studies to show if our interventions have lasting effects over time. There is room for argument here, I guess, but three months is awfully short.
Why is healthcare tech different than finding the MVP in the rapidly-changing, market-responsive world of mobile tech? One reason may be that we’re dealing with health and sickness which are qualitatively different than sending a friend the latest snapshot from vacation. It is cliché to say it, but lives are at stake. So we’re more careful and more demanding of evidence. Is this holding us up from the changes that need to occur in our broken health care non-system? Possibly.
It is true that a well designed trial with proper sample size is expensive and takes time. Technologies change faster than we can evaluate them.
One thing we’ve done at CCH is design studies that use a large matched data set from our electronic record as a comparator. This speeds things up a bit, eliminating the need to enroll, randomize and follow a control group. Results are acceptable to all but the most extreme purists.
What ideas do you have on this dilemma?