Polycystic ovary syndrome (PCOS) is a complex condition characterized by high male hormone levels, irregular menstrual cycles, lack of ovulation, and sometimes small ovarian cysts. Often underdiagnosed, PCOS leads to significant health issues, making timely and efficient identification crucial. Recently, machine learning (ML) has shown promise in medical diagnoses, but the perceived "black box" nature of ML models necessitates explanations of key parameters influencing predictions. This study aims to provide global explanations using SHapley Additive exPlanations (SHAP) to ensure the efficiency, effectiveness, and reliability of the ML model. An open-access dataset with 300 PCOS patients was utilized to predict whether a patient's luteinizing hormone (LH) to follicle-stimulating hormone (FSH) ratio is up to 1 or more than 1. The study employed ML classifiers including AdaBoost, XGBoost, CatBoost, and Bagging methods, with Bagging performing the best. The modeling process used a 5-fold cross-validation approach, splitting the dataset into 80% training and 20% testing sets. The model's performance was evaluated using accuracy (ACC), balanced accuracy (b-ACC), specificity (SP), sensitivity (SE), negative predictive value (npv), positive predictive value (ppv), and F1-score. The Bagging method yielded the following performance metrics: ACC (99.0%), b-ACC (99.0%), SE (98.8%), SP (99.1%), ppv (97.7%), npv (99.5%), and F1-score (98.3%). SHAP analysis identified the top predictors for distinguishing between LH: FSH ratio categories as TTng/dL, BMI, AMH, age, family history, and menstrual cycle regulation. This study demonstrates that incorporating SHAP explanations enhances the interpretability and reliability of ML models in diagnosing PCOS.
Key words: Polycystic ovary syndrome, explainable artificial intelligence, machine learning, SHapley Additive exPlanations
|