Aim: Early diagnosis of diabetes mellitus (DM), one of the most important health problems worldwide, and taking necessary steps are very important. Therefore, it has become very important to develop models for the prediction of the disease. The aim of this study is to create a clinical decision support model with Stochastic Gradient Boosting, a machine learning model for DM prediction.
Material and Methods: In the study, modeling was done with the Stochastic Gradient Boosting method using an open access data set including the factors associated with DM. Model results were evaluated with accuracy, balanced accuracy, sensitivity, selectivity, positive predictive value, negative predictive value, and F1-score performance metrics. In addition, 5-fold cross-validation method was used in the modeling phase. Finally, variable importance values were obtained by modeling.
Results: Accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score from by Stochastic Gradient Boosting modeling were 93.6%, 92.8%, 91.7%, 93.9%, 73.3%, 98.4%, and 81.5%, respectively. According to the variable importance values obtained for the input variables in the data set examined in this study, the most important variables are glucose, age, systolic BP, cholesterol, chol/HDL, BMI, height, waist/hip, HDL, waist, weight, diastolic BP, hip, and gender: male.
Conclusion: In the current study, it was seen that the ML model applied with the results obtained can predict diabetes. Addition, according to the results of the relevant model, the most important risk factors for DM were determined and given in degrees of importance of the risk factors. With these results, necessary precautions can be taken for the disease at early levels.
Key words: Diabetes mellitus, classification, machine learning, Stochastic Gradient Boosting.
|