AI model identifies CKD risk in older adults
An extreme gradient boosting machine learning model accurately predicted future chronic kidney disease risk among older patients with hyperglycemia, including both prediabetes and diabetes, according to a prospective multicenter cohort study published in Renal Failure.
The study researchers enrolled 15,578 adults aged 60 and older with hyperglycemia from four communities in Shanghai, China. Data from three communities were pooled and randomly split into a training (n = 8,815) and an internal validation (n = 2,202) cohort, while a fourth community served as an external validation cohort (n = 4,561). Follow-ups ranged from 2 to 3 years.
Researchers used Boruta feature selection, least absolute shrinkage and selection operator regression, and collinearity testing to identify 10 predictors of chronic kidney disease (CKD) risk: serum creatinine, age, systolic blood pressure, hemoglobin A1c, hemoglobin, triglycerides, high-density lipoprotein cholesterol, uric acid, body mass index, and blood urea nitrogen.
During follow-up, CKD progression occurred in 9% of patients in the training cohort, 9% in the internal validation cohort, and 11% in the external validation cohort, noted Lijuan Zhang, of Tongji University, Shanghai, and colleagues.
The researchers compared five machine learning approaches: logistic regression, k-nearest neighbors, random forest, light gradient boosting machine, and XGBoost. Although both random forest and XGBoost achieved AUC values above 0.9 in the training dataset, XGBoost demonstrated greater stability across validation datasets, achieving AUC values of 0.809 in the internal test set and 0.837 in the external validation cohort. In the training cohort, the model achieved 78% accuracy, 77% specificity, and 87% sensitivity.
After identifying XGBoost as the best-performing model, the researchers evaluated how individual clinical variables contributed to its predictions using SHapley Additive exPlanations (SHAP). SHAP identified serum creatinine, age, and hemoglobin as the most influential predictors, accounting for 25%, 19%, and 15% of total model importance, respectively. Higher serum creatinine and older age were associated with increased CKD risk, whereas higher hemoglobin levels were associated with lower risk.
Risk stratification analysis divided patients into low-risk (up to 5%), medium-risk (5% to 25%), and high-risk (25% to 100%) groups based on predicted probabilities. Observed CKD incidence rates were 1%, 10%, and 56%, respectively. Compared with the Chien equation, a previously validated CKD prediction model, XGBoost more effectively identified patients at highest risk for future CKD. Among individuals classified as high risk, observed CKD incidence was 55.5% with XGBoost versus 20.5% with the Chien equation.
Sensitivity analyses using stricter CKD exclusion criteria, Synthetic Minority Over-sampling Technique balancing, and subgroup analyses all yielded AUC values greater than 0.9, supporting the model's strength.
The study had several limitations. The absence of urine test data precluded calculation of the urine albumin-to-creatinine ratio, which could further enhance model performance. Model development was limited to five common machine learning algorithms, and all participants were recruited from Shanghai communities, limiting generalizability to other populations.
"Future research should incorporate more diverse, multi-dimensional datasets to strengthen the model’s representativeness and broader applicability," concluded Zhang and colleagues.
No conflicts of interest were reported.
AACE Endocrine AI is published by Conexiant under a license arrangement with the American Association of Clinical Endocrinology, Inc. (AACE®). The ideas and opinions expressed in AACE Endocrine AI do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.