European annals of dental sciences (Online), cilt.52, sa.1, ss.10-16, 2025 (TRDizin)
Purpose: This studyaimed to evaluate the accuracyand comprehensiveness ofthe responses generated byGPT-4o and Claude-3.5 Sonnetto the mostfrequentlyasked questions aboutendodontic emergencies. Materials and Methods: Themostfrequentlyaskedquestions aboutnine differenttopics (inferior alveolar nerve block, sodium hypochlorite accidents, aspiration of dental materials, separated instruments, perforation, transportation, Ca(OH)2 extrusion, root f illing, and flare-up) in endodontics were generated byGPT3.5. Each question was asked to both GPT-4o andClaude3.5 Sonnet. Twoauthorsindependentlyscored the responses. Accuracyand comprehensiveness were assessed for each question using Likert scales. The data were statistically analyzed using the Mann–WhitneyU test and the Kruskal–Wallis test. The significance level wassetat0.05. Results: Responses generated byboth GPT-4oandClaude3.5Sonnettoatotalof81open-endedquestionswereevaluated. Thetwo models yielded similar results in terms of accuracyand comprehensiveness (p > 0.05). The topics ofrootfilling, perforation, and f lare-up havethelowestaccuracyscores, and rootfilling and separated instruments have the lowestcomprehensiveness scores for GPT-4o(p<0.05). TheaccuracyofClaude3.5’s responses did notshowsignificantdifferences between the topics (p > 0.05); however, separated instruments had the lowestcomprehensiveness scores (p < 0.05). Conclusions: The accuracyand comprehensiveness scores ofGPT-4 and Claude 3.5 Sonnetare statisticallysimilar. Despite the high levels of accuracy and comprehensiveness shown byGPT-4oandClaude3.5Sonnet, theydonotyethavetheeffectof replacing the operator in endodontic procedures.