News Research Predictive Risk Models Research and Evidence

Logistic regression model distinguishes advanced diabetic kidney disease

The interpretable prediction tool outperformed several more complex machine learning approaches.

May 25, 2026 By Matthew Solan 3 min read

An interpretable logistic regression model using 10 routinely collected clinical and laboratory variables distinguished advanced diabetic kidney disease with high accuracy among internal and external validation cohorts, according to a retrospective study published in
Frontiers in Endocrinology.

"The LR [logistic regression] model can assist clinical practice by effectively identifying individuals at higher risk of advanced DKD at an early stage, allowing patients to receive timely and personalized treatment, and thereby providing a reliable foundation for improving patient prognosis and optimizing medical resource utilization," researchers wrote.

Study Methods

For the study, which was led by Kaiwen Zheng of the Department of Clinical Laboratory at Fuzhou University Affiliated Provincial Hospital in Fuzhou, China, and colleagues, researchers developed and validated 8 machine learning models using electronic medical record (EMR) data from 2,359 patients with diabetic kidney disease treated at Fuzhou University Affiliated Provincial Hospital in China between 2013 and 2024. They externally validated the models using data from 1,559 patients with diabetic kidney disease in the National Health and Nutrition Examination Survey (NHANES) from 1988 to 2018.

Patients were categorized as having early diabetic kidney disease (stage G1 to G2; n = 223) or advanced diabetic kidney disease (stage G3 to G5; n = 2,136) in the internal cohort. The external cohort included 979 patients with early disease and 580 patients with advanced disease.

Researchers identified 10 predictor variables: serum creatinine, age, hemoglobin, serum urea, serum alkaline phosphatase, serum uric acid, platelet count, serum osmolality, serum bicarbonate, and monocyte count.

The 8 machine learning models evaluated included logistic regression, random forest, gradient boosting machine, light gradient boosting machine, support vector machine, naive Bayes, least absolute shrinkage and selection operator, and extreme gradient boosting models.

The logistic regression model demonstrated the most consistent overall performance. In the training cohort, the model attained an area under the curve (AUC) of 0.941, accuracy of 93%, sensitivity of 98%, positive predictive value of 95%, negative predictive value of 71%, and F1 score of 0.962.

In internal validation testing, the logistic regression model reached an AUC of 0.948, accuracy of 94%, sensitivity of 98%, positive predictive value of 95%, negative predictive value of 70%, and F1 score of 0.966.

External validation using NHANES data showed an AUC of 0.898, accuracy of 81%, sensitivity of 77%, specificity of 84%, positive predictive value of 74%, negative predictive value of 86%, and F1 score of 0.754.

Calibration curves and decision curve analyses showed good agreement between predicted and observed risk across threshold ranges. The logistic regression model demonstrated lower Brier scores than most competing models, supporting calibration accuracy.

The study also used SHapley Additive exPlanation (SHAP) analysis to quantify each variable’s contribution to model predictions at both the population and patient levels.

SHAP showed that higher serum creatinine, older age, elevated alkaline phosphatase, higher uric acid, higher serum osmolality, and higher monocyte counts contributed to positive predictions of advanced diabetic kidney disease.

The final model was also deployed as a web-based calculator for clinicians to estimate the risk of advanced diabetic kidney disease using the 10 selected variables.

Several limitations were noted, including the retrospective, cross-sectional design, potential selection bias in hospital-based EMR data, and limited generalizability despite external validation in a U.S. cohort. The model also excluded imaging, lifestyle, social, and family history variables that could affect predictive performance.

The researchers reported no conflicts of interest.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Association of Clinical Endocrinology, Inc. (AACE^®). The ideas and opinions expressed in AACE Endocrine AI do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Logistic regression model distinguishes advanced diabetic kidney disease

Study Methods

Related Content