The aim of this study is to classify the disease with the gradient increasing tree classification method in an open access dataset containing data from patients with and without stroke disease. In addition, it is aimed to compare the results by balancing the data with the oversampling method Synthetic Minority Over-sampling Technique (SMOTE) which is one of the data balancing methods in the study. In this study, a dataset containing information about patients with and without stroke disease obtained from the address "https://www.kaggle.com/asaumya/healthcare-problem-prediction-stroke-patients" was used. In the study, SMOTE was used as the data balancing method, and the gradient boosting tree method was used in the modeling. The performance of the model was evaluated by Specificity, sensitivity, accuracy, positive predictive value and negative predictive values. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the original version of the dataset. Specificity, sensitivity, accuracy, positive predictive value and negative predictive values were obtained as 0.0887, 0.9772, 0.9339, 0.9544 and 0.1679, respectively, according to the modeling result using the gardient boosting tree method using the SMOTE applied version of the dataset. When the results obtained from the study were examined, the modeling results made with the SMOTE applied dataset were obtained more consistently and realistically. As a result, it is suggested that researchers use dataset balancing methods to acquire more accurate results whenever they come across an unbalanced dataset problem.
Key words: Stroke, classification, gradient boosting tree, unbalanced data, SMOTE
|