News Research Diagnostics & Imaging Research and Evidence Precision Endocrinology

Agentic AI system may improve rare disease diagnosis 

April 07, 2026 By Matthew Solan 3 min read
Share Share via Email Share on Facebook Share on LinkedIn Share on Twitter

An agentic artificial intelligence system called DeepRare improved diagnostic accuracy for rare diseases, including endocrine conditions, compared with existing tools and physicians, according to a study published in Nature.

“The system’s ability to provide evidence-based reasoning chains with verifiable references could reduce significantly the time required for literature review and case research,” wrote lead researcher Weike Zhao of the School of Artificial Intelligence at Shanghai Jiao Tong University in Shanghai, China, and colleagues. “It’s consistent performance across different medical specialties suggests its potential as a valuable decision support tool for non-specialist physicians who may encounter rare diseases infrequently.”

DeepRare is a large language model-driven, multi-agent system designed to support differential diagnosis using clinical, phenotypic, and genetic data. The system integrates more than 40 tools and multiple medical literature and case databases. It also includes a self-reflection loop that repeatedly reassesses its diagnostic suggestions to reduce errors.

Researchers evaluated DeepRare on 6,401 clinical cases collected from nine datasets across Asia, North America, and Europe, spanning 2,919 diseases and 14 medical specialties.

Recall@K metrics evaluated diagnostic performance across datasets by assessing whether the correct diagnosis was ranked first (Recall@1), within the top three (Recall@3), or within the top five (Recall@5).

Across phenotype-based tasks, DeepRare achieved an average score of 57% at Recall@1 and 65% at Recall@3, exceeding the second-best method (Reasoning LLM) by 24% and 19%, respectively.

When comparing systems using both human phenotype ontology and genetic data, DeepRare outperformed Exomiser with Recall@1 of 69.1% vs 55.9% in 168 cases, and 63.6% vs 58% in 162 cases.

The system maintained performance across heterogeneous datasets, including real-world clinical cohorts. For example, in the MIMIC-IV rare disease dataset, Recall@1 was 29% and Recall@3 was 37% compared with existing models, and in an in-house hospital dataset, Recall@1 achieved 58% with Recall@3 of 71%.

DeepRare also demonstrated consistent gains across specialties, with top 1 diagnostic accuracy of 66% in kidney and urinary system disorders and 60% in endocrine system diseases, compared with 32% in the second-best method. However, performance was lower in pulmonary conditions, with 31% accuracy.

In diseases with limited case representation (10 or fewer cases), DeepRare achieved Recall@1 (> 0.8) for 32% of diseases, compared with 27% and 24% for the comparator models.

In a direct comparison with five experienced physicians evaluating 163 cases using identical inputs, DeepRare achieved higher diagnostic accuracy. Recall@1 was 64% vs 55% for physicians, and Recall@5 was 79% vs 66%. Across 180 cases reviewed by 10 rare disease specialists, the system’s reasoning outputs demonstrated high validity, with 95% agreement on reasoning validity.

Failure analysis identified reasoning weighting errors (41%) and phenotypic mimic diagnosis (39%) as the most common causes of incorrect diagnoses, whereas fundamental errors in reasoning and in using retrieved information were uncommon (3%).

Researchers noted that study limitations included incomplete integration of available data sources and difficulty distinguishing conditions with similar clinical features. Patient interaction features were also not fully validated. The system is intended primarily for patients already suspected of having a rare disease rather than for screening.

The researchers reported no competing conflicts.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Association of Clinical Endocrinology, Inc. (AACE®). The ideas and opinions expressed in AACE Endocrine AI do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Related Content