Researchers crack code on training medical AI with minimal examples

Written by Michael Anthony Bitoon

Published 4 Aug 2025

Fact checked by

Sophia Feona Cantiller

Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

A breakthrough artificial intelligence (AI) system can train medical diagnostic software using just 40 expert-labeled images instead of the thousands typically required. This development could make advanced healthcare technology available in underserved regions worldwide.

GenSeg, developed by researchers at UC San Diego, reduces data requirements for medical image analysis by up to 20 times while boosting accuracy by 10-20%. The system was published in Nature Communications on July 14, 2025.

“This project was born from the need to break this bottleneck and make powerful segmentation tools more practical and accessible, especially for scenarios where data are scarce,” said Li Zhang, the study’s lead author and PhD student at UC San Diego.

Traditional medical AI requires thousands of pixel-by-pixel annotated images created by highly trained specialists. This process costs significant time and money. Many medical conditions and clinical settings simply lack sufficient data.

GenSeg solves this problem by creating synthetic medical images. The system makes artificial images from color-coded disease maps, then combines these with real patient scans to train diagnostic models.

The technology works in clear stages. First, it learns to generate realistic images from expert-labeled maps. Then it creates new artificial image pairs to add to small real-world datasets.

A continuous feedback loop improves the synthetic images based on their effectiveness in aiding diagnosis.

“The segmentation performance itself guides the data generation process,” Zhang explained. “This ensures synthetic data are not just realistic, but tailored to improve the model’s capabilities.”

Researchers tested GenSeg across 19 medical datasets covering different conditions. The system successfully identified skin lesions, breast cancer, lung problems, foot ulcers, and blood vessel issues. It also worked with 3D brain and liver scans.

In lung analysis tasks, GenSeg matched the performance that is typically achieved with 175 expert-labeled examples using only nine images. That’s 19 times more efficient.

For skin cancer diagnosis, doctors might need to label just 40 skin images instead of thousands. The AI could then spot suspicious lesions from patient scans in real time. “It could help doctors make a faster, more accurate diagnosis,” Zhang said.

The technology outperformed existing tools, including nnUNet and other leading methods. GenSeg works with popular AI systems like UNet, DeepLab, and transformer-based models without requiring extra computing power during diagnosis.

The research team included collaborators from UC Berkeley, Stanford, and the Weizmann Institute. The National Science Foundation and National Institutes of Health provided funding.

Future development will focus on getting direct feedback from doctors and handling more complex medical cases. The researchers aim to make their tool more useful for real-world medical applications.

GenSeg addresses a major barrier preventing AI adoption in hospitals with limited resources. By dramatically cutting annotation costs, the technology could bring advanced diagnostic tools to underserved communities globally.