Purpose: The purpose of this study was to evaluate the accuracy, completeness, comprehensibility and reliability of widely available AI chatbots in addressing clinically significant queries pertaining to implant dentistry. Materials and Methods: Twenty questions were devised based on those that were most frequently asked or encountered during patient consultations by three experienced prosthodontists. That questions were asked to ChatGPT- 3.5, Gemini, Copilot AI chatbots. All questions were asked to the each chatbot three times with a twelve days intervals and a three-point Likert scale (Grade 0: incorrect, grade 1: incomplete or partially correct, and grade 2: correct) and a two point scale (true and false) were employed by the authors to grade the accuracy of the responses independently. Also completeness and comprehensibility were evaluated using a three-point Likert scale. Frequently asked five questions to each chatbot were analyzed. The comparison of total scores of the chatbots was made with one-way analysis of variance. Two point scale data were analysed by Chi-Square test. The reliability of the responses for each chatbot was analyzed by assessing the consistency of repeated responses by calculating Cronbach’s alpha coefficients. Results: When the total scores of the chatbots were analyzed (ChatGPT-3.5 = 28.78 ± 4.06, Gemini = 30.89 ± 4.08, Copilot = 29.11 ± 3.22), one-way ANOVA revealed no statistically significant differences (P=.461). Evaluation of two-point scale data which analysed by Chi-Square test, revealed no statistical difference among the chatbots (P=.336). Gemini has shown higher completeness level than ChatGPT-3.5 (P=.011). There was no statistically significant difference among AI chatbots in terms of comprehensibility. Copilot demonstrated the greatest overall consistency among the three chatbots, with a Cronbach's alpha value of 0.863. This was followed by ChatGPT-3.5 with a Cronbach's alpha value of 0.779 and Gemini with a Cronbach's alpha value of 0.636. Conclusions: The accuracy of three chatbots was found similar. All three chatbots demonstrated an acceptable level of consistency. However, given the low accuracy rate of chatbots in answering questions, it is clear that they should not be the sole decision-maker. The clinician's opinion must be given priority.
Schlagwörter: Artificial intelligence, Copilot, ChatGPT, dental implant, Gemini, large language models