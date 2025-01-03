Global Edition
Artificial Intelligence

DoD to develop scalable genAI testing datasets

Through a recently completed multipronged red-teaming effort, the agency said it will develop repeatable testing datasets that can be used to evaluate large language model tools and services in the future.
By Andrea Fox
January 03, 2025
10:44 AM

Photo: Roberto Westbrook/Getty Images

The U.S. Department of Defense's Chief Digital and Artificial Intelligence Office and technology nonprofit Humane Intelligence announced the conclusion of the agency's Crowdsourced Artificial Intelligence Red-Teaming Assurance Program pilot, which is focused on testing large language model chatbots used in military medical services.

The findings could ultimately improve military medical care by adhering to all required risk management practices for the use of AI, DoD officials said.

WHY IT MATTERS

In an announcement Thursday, DoD said the CAIRT program's most recent red-team test involved more than 200 agency clinical providers and healthcare analysts to compare three LLMs for two prospective use cases: clinical note summarization and a medical advisory chatbot. 

They found more than 800 potential vulnerabilities and biases where LLMs are being tested to enhance military medical care.

CAIRT aimed to build a community of practice around algorithmic evaluations in collaboration with the Defense Health Agency and the Program Executive Office, Defense Healthcare Management Systems. In 2024, the program also offered a financial AI bias bounty focused on unknown risks in LLMs, beginning with open-source chatbots.

Crowdsourcing casts a wide net that can produce large volumes of data across multiple stakeholders. DoD said the findings from all CAIRT program red-teaming efforts will be crucial to shaping policies and best practices for the responsible use of generative AI.

DoD also said continued testing of LLMs and AI systems through the CAIRT Assurance Program is critical to accelerating AI capabilities and justifying confidence across DoD genAI use cases.

THE LARGER TREND

Trust is essential for clinicians to embrace AI. To use genAI in clinical care, LLMs must meet critical performance expectations to best assure providers that the tools are useful, transparent, explainable and secure, as Dr. Sonya Makhni, medical director of applied informatics at Mayo Clinic Platform, told Healthcare IT News recently.

Despite the enormous potential for the positive use of AI in healthcare delivery, "unlocking that is challenging," said Makhni at the HIMSS AI in Healthcare Forum this past September.

Because "assumptions and decisions are made during each step of the AI development life cycle, and if incorrect these assumptions can lead to systematic errors," allowing bias to creep in, Makhni explained when asked about how to deliver the safe use of AI.

"Such errors can skew the end result of an algorithm against a subgroup of patients and ultimately pose risks to healthcare equity," she continued. "This phenomenon has been demonstrated in existing algorithms."

To test performance and eliminate algorithmic bias, clinicians and developers must work together collaboratively, "throughout the AI development life cycle and through solution deployment," Makhni advised.

"Active engagement from both parties is necessary in predicting potential areas of bias and/or suboptimal performance," she added. "This knowledge will help clarify contexts that are better suited to a given AI algorithm and those that perhaps require more monitoring and oversight."

ON THE RECORD

"Since applying GenAI for such purposes within the DoD is in earlier stages of piloting and experimentation, this program acts as an essential pathfinder for generating a mass of testing data, surfacing areas for consideration and validating mitigation options that will shape future research, development and assurance of GenAI systems that may be deployed in the future," said Dr. Matthew Johnson CAIRT program lead, in a Jan. 2 statement about the initiative.

Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.

Topics: 
Analytics, Artificial Intelligence, Clinical, Compliance & Legal, Government & Policy, Quality and Safety

More regional news

Doctors reviewing data on a digital tablet

What to expect in Asia-Pacific health IT in 2025?

By
Adam Ang
January 09, 2025
man on hospital bed

AI that identifies undiagnosed cognitive impairment could improve VBC

By
Andrea Fox
January 08, 2025
Kaiser Permanente's Dr. Daniel Yang on AI

Kaiser Permanente's new head of AI on 'two fundamental shifts' the technology will enable

By
Bill Siwicki
January 07, 2025
Want to get more stories like this one? Get daily news updates from Healthcare IT News.
Your subscription has been saved.
Something went wrong. Please try again.

Top Story

man on hospital bed
AI that identifies undiagnosed cognitive impairment could improve VBC

Most Read

Monash's clinical AI collab with India's Apollo Hospitals and more briefs
China to pilot standards for virtual primary care
Hong Kong university to test four genAI models in hospitals
Discover the potential ROI of bedside telehealth
ATA calls on Congress to beat the telehealth deadline, as it preps for Trump's term
HIMSS launches veterans health IT workforce program

Research

White Papers

More Whitepapers

Patient Engagement
Patient Engagement
Financial/Revenue Cycle Management

Webinars

More Webinars

Imaging
Interoperability
Artificial Intelligence

Video

Jagadeesh Ramasamy at Narayana Health_HIMSS24 APAC
Going in-house in hospital application development
Sang-Heon Lee at Korea University Anam Hospital_HIMSS24 APAC
Korea University Anam Hospital to pilot LLM next year
Lai-Shiun Lai at Taichung Veterans General Hospital_HIMSS24 APAC
Saving 38,000 surgery hours, most ER beds in Taiwan
Mike Miliard, Susan Morse, Jessica Hagen at HIMSS Media Part 2_Health in 2025 Photo by pcess609/iStock/Getty Images Plus
Looking ahead to 2025 with HIMSS Media, part 2

More Stories

Developers around a table with documents discussing product life cycles
FDA offers new draft guidance to developers of AI-enabled medical devices
Mike Miliard, Susan Morse, Jessica Hagen at HIMSS Media Part 2_Health in 2025 Photo by pcess609/iStock/Getty Images Plus
Looking ahead to 2025 with HIMSS Media, part 2
Julia Strandberg, executive vice president and chief business leader of connected care at Royal Philips
Q&A: Philips on evolving care models, cybersecurity and digital health's future
David Nickelson of Cella by Randstad Digital
In 2025, look for more digital-first patient engagement and data-driven decisions
Soldier using a tablet
DoD to develop scalable genAI testing datasets
iPad with stethoscope on top
HIMSSCast: How genAI can reinvent the work of clinicians
Mike Miliard, Susan Morse, Jessica Hagen at HIMSS Media_Health in 2025 Photo by pcess609/iStock/Getty Images Plus
Looking ahead to 2025 with HIMSS Media, part 1
Oren Nissim of Brook Health on RPM
Remote patient monitoring will boost CHF and GLP-1 care in 2025