Search Dental Tribune

AI tools for dental anaesthesiology are promising but need work

At the present time, large language models are limited in how much information they can accurately provide about dental anaesthesiology. (Image: Tada Images/Shutterstock)

YOKOSUKA, Japan: Large language models (LLMs), a type of artificial intelligence (AI) tool, are gaining popularity across medical disciplines, including dental anaesthesiology, for their potential to enhance information gathering, patient care and educational practices. A new study from Japan has evaluated the usefulness of major LLMs in dental anaesthesiology and concluded that, while LLMs like ChatGPT‑4 and Claude 3 Opus show promise, further advancements in model training and prompt engineering and the availability of high-quality, field-specific information are essential for their effective integration into dental anaesthesiology.

The LLMs selected by the researchers also included Gemini 1.0 and were utilised on the board certification examination of the Japanese Dental Society of Anesthesiology. The goal was to assess the utility of these models by comparing their accuracy in answering examination questions, offering insights into their application in dental anaesthesiology.

The study used 295 multiple-choice questions from the examination, spanning 2020 to 2022, excluding questions that required visual interpretation or were considered inappropriate. The questions were manually input into the LLMs without specialised prompt engineering. Each model answered the questions three times, and accuracy was determined by comparing responses to consensus answers from board-certified dental anaesthesiologists. Performance was analysed across six categories: basic physiology, local anaesthesia, sedation and general anaesthesia, systemic management, pain management, and shock and resuscitation.

ChatGPT‑4 achieved the highest accuracy (51.2%), followed by Claude 3 Opus (47.4%), whereas Gemini 1.0 lagged significantly behind (30.3%). ChatGPT‑4 and Claude 3 Opus performed particularly well in areas like systemic management and pain management. However, all the models demonstrated overall accuracy rates below 60%, suggesting limitations in their current utility for clinical decision-making in dental anaesthesiology.

The study attributed the relatively low accuracy to several factors. First, the availability of dental anaesthesiology information online is limited, potentially impacting on the performance of the models. Additionally, the absence of specific prompt engineering and the potential ambiguities in the original Japanese questions might have further contributed to the suboptimal results. The authors noted that optimising prompts could significantly improve the accuracy of the LLMs in answering medical questions.

Despite these limitations, the study highlighted the potential of LLMs to support dental anaesthesiology practice, particularly in administrative tasks such as documentation. However, it also cautioned against relying on them for complex clinical decisions owing to risks like “hallucinations”, where models generate plausible but incorrect responses.

The study, titled “Evaluating large language models in dental anesthesiology: A comparative analysis of ChatGPT‑4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology board certification exam”, was published online on 27 September 2024 in Cureus.

Topics:
Tags:
To post a reply please login or register
advertisement
advertisement