#4D6D88_Small Cover_March-April 2024 DRA Journal

In this exclusive Show Preview Issue, we present the IDEM Singapore 2024 Q&A Forum featuring key opinion leaders; their clinical insights covering orthodontics and dental implantology; plus a sneak peek at the products and technologies set to take center stage at the event. 

>> FlipBook Version (Available in English)

>> Mobile-Friendly Version (Available in Multiple Languages)

Click here to access Asia's first Open-Access, Multi-Language Dental Publication

ChatGPT’s Accuracy in Citing Journal Articles Under Scrutiny

The ChatGPT, a generative pre-trained transformer chatbot developed by OpenAI, has recently been making waves with its impressive capabilities. From answering questions to generating content, its potential applications in healthcare and education have been a topic of exploration and debate within these fields. However, amid the excitement, concerns have emerged regarding its accuracy in providing scientific references. 

In fact, certain journals, like Science, have gone so far as to ban chatbot-generated text in their published reports. This has led to an investigation aimed at quantifying ChatGPT’s citation error rate.

Investigating ChatGPT’s Citation Accuracy

A team of researchers from the Learning Health Community in Palo Alto, California, embarked on a study to gauge the utility of ChatGPT as a research copilot. The research focused on assessing its ability to generate content for learning health systems (LHS). 

Read: ChatGPT and the Future of Dentistry

Engaging with the latest GPT-4 model from OpenAI between April 20 and May 6, 2023, the researchers delved into a wide spectrum of LHS topics. These encompassed both broad subjects like LHS and data, as well as specific themes, such as building a stroke risk prediction model using the XGBoost library. 


Click to Visit website of India's Leading Manufacturer of World Class Dental Materials, Exported to 90+ Countries.


 

Importantly, the study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Fact-Checking Cited Journal Articles

To scrutinise ChatGPT’s accuracy in citing journal articles, the researchers employed a meticulous approach. Each cited journal article was subjected to a thorough verification process. This involved confirming the article’s existence in the cited journal and cross-referencing its title on Google Scholar. 

Various details, including the article’s title, authors, publication year, volume, issue, and page numbers, were meticulously compared. Any article that failed this verification was flagged as fake. 

To establish a dependable error rate, the study involved the examination of over 300 article references related to LHS topics. In addition, the researchers conducted interactions with OpenAI’s default GPT-3.5 model using identical LHS topics. The study relied on exact 95% confidence intervals for the error rates, and the statistical significance was assessed using the Fisher exact test, with a significance level set at P<.05.

Read: Using ChatGPT in your Dental Practice

Startling Findings Unveiled

The results of the investigation unveiled some startling insights. From the default GPT-3.5 model, a staggering 98.1% (95% CI, 94.7%-99.6%) of the 162 reference journal articles subjected to fact-checking were identified as fake articles. In contrast, when evaluating the GPT-4 model, out of 257 fact-checked articles, 20.6% (95% CI, 15.8%-26.1%) were found to be fake. While the error rate for reference citing with GPT-4 was significantly lower than that of GPT-3.5 (P<.001), it remained noteworthy, particularly in narrower subject areas.

Implications and Discussion

The implications of these findings are significant. It suggests that GPT-4, while showing promise as a research copilot for LHS education and training, has limitations. Specifically, it falls short in delivering the latest information and occasionally cites fake journal articles. Consequently, human verification remains essential, and references cited by GPT-3.5 should be treated with caution.

Moreover, ChatGPT itself has offered insights into why it may return fake references, citing factors like unreliable training data and the model’s challenges in distinguishing reliable from unreliable sources. As generative chatbots become integrated into healthcare education and training, it becomes vital to understand their capabilities, including their inability to fact-check responses. Additionally, ethical concerns, such as misinformation and data bias, should be carefully considered when deploying GPT technology in healthcare settings.

Read: Dental Professionals at Relatively Low Risk of AI-induced Job Displacement, Korean Study says

The information and viewpoints presented in the above news piece or article do not necessarily reflect the official stance or policy of Dental Resource Asia or the DRA Journal. While we strive to ensure the accuracy of our content, Dental Resource Asia (DRA) or DRA Journal cannot guarantee the constant correctness, comprehensiveness, or timeliness of all the information contained within this website or journal.

Please be aware that all product details, product specifications, and data on this website or journal may be modified without prior notice in order to enhance reliability, functionality, design, or for other reasons.

The content contributed by our bloggers or authors represents their personal opinions and is not intended to defame or discredit any religion, ethnic group, club, organisation, company, individual, or any entity or individual.

Leave a Reply

Your email address will not be published. Required fields are marked *