Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot

Yasin Calışkan; Abas Haşimoğlu

doi:10.5455/PBS.20250722043208

PBS. 2026; 16(1): 46-56

doi: 10.5455/PBS.20250722043208

Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot

Yasin Calışkan, Abas Haşimoğlu.

Abstract	Download PDF		Post
Objective: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder affecting social communication and involving restricted, repetitive behaviors. With AI tools like ChatGPT-4, Gemini, and Microsoft Copilot becoming increasingly popular information sources for healthcare professionals and families, this study aimed to evaluate and compare their accuracy and readability when responding to ASDrelated questions. Methods: In this cross-sectional study, we presented 88 questions (45 Frequently Asked Questions [FAQs] and 43 guideline-based) to the three AI models. We sourced questions from social media, parent forums, and clinical guidelines. Two blinded child psychiatrists evaluated response accuracy using a four-grade scale, while readability was assessed using four established indices: Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, and Flesch Reading Ease. Results: For FAQs, accuracy rates showed significant differences (p=0.001): Gemini (100%), ChatGPT-4 (95.6%), and Microsoft Copilot (71.1%). For guideline-based questions, accuracy also varied significantly (p=0.010): Gemini (86.0%), ChatGPT-4 (83.7%), and Microsoft Copilot (55.8%). Interestingly, Microsoft Copilot provided the most readable FAQ responses, while Gemini offered the most balanced readability for guideline-based questions. Conclusion: Our findings show that Gemini and ChatGPT-4 are highly accurate for ASD information, particularly for complex scientific content, while Microsoft Copilot produced more accessible text despite lower accuracy. These results suggest different models may better serve different audiences—healthcare professionals might benefit from Gemini or ChatGPT-4's precision, while general users might prefer Copilot's readability, highlighting opportunities for improving both reliability and accessibility in healthcare communication. Key words: Artificial Intelligence, Autism spectrum disorder, ChatGPT-4, Gemini, Microsoft Copilot, Readability

Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot

Abstract