#4D6D88_Small Cover_March-April 2024 DRA Journal

In this exclusive Show Preview Issue, we present the IDEM Singapore 2024 Q&A Forum featuring key opinion leaders; their clinical insights covering orthodontics and dental implantology; plus a sneak peek at the products and technologies set to take center stage at the event. 

>> FlipBook Version (Available in English)

>> Mobile-Friendly Version (Available in Multiple Languages)

Click here to access Asia's first Open-Access, Multi-Language Dental Publication

Comparative Study: ChatGPT vs. Human Evaluators Assessing Medical Literature

UK: In the face of clinicians’ ongoing challenges in keeping up with the ever-expanding body of medical research, artificial intelligence (AI) tools have emerged as potential solutions. One such tool, ChatGPT3, a large language model, is being assessed for its capability to automate the appraisal of research quality, with the aim of saving time and reducing bias in the process. 

Researchers from Swansea University, UK, have conducted a comparative study to determine ChatGPT’s proficiency compared to human evaluation when it comes to scoring research abstracts according to recognised reporting standards.

Methodology: Comparing ChatGPT and Human Evaluators

To assess the effectiveness of ChatGPT in this context, the researchers compared ChatGPT’s scoring of implant dentistry abstracts against human evaluators. They employed the Consolidated Standards of Reporting Trials for Abstracts (CONSORT-A) reporting standards checklist to establish an overall compliance score (OCS). 


Click to Visit website of India's Leading Manufacturer of World Class Dental Materials, Exported to 90+ Countries.


 

Read: ChatGPT’s Accuracy in Citing Journal Articles Under Scrutiny

The study also delved into a Bland-Altman analysis to evaluate the agreement between human evaluators and the AI-generated OCS percentages. Additional error analysis considered the mean difference of OCS subscores, Welch’s t-test, and Pearson’s correlation coefficient.

Findings: ChatGPT’s Performance and Correlations

The results of the study are revealing. Bland-Altman analysis revealed a mean difference of 4.92% (95% CI 0.62%, 0.37%) in OCS between human evaluation and ChatGPT. 

In the error analysis, there were only slight mean differences across most domains, with the highest observed in the ‘conclusion’ domain (0.764 (95% CI 0.186, 0.280)) and the lowest in ‘blinding’ (0.034 (95% CI 0.818, 0.895)). Notably, strong correlations were observed in the ‘harms’ (r=0.32, p<0.001) and ‘trial registration’ (r=0.34, p=0.002) domains, while weaker correlations were found in the ‘intervention’ (r=0.02, p<0.001) and ‘objective’ (r=0.06, p<0.001) domains.

Read: ChatGPT and the Future of Dentistry

Implications and Future Prospects

The study’s conclusion highlights the potential of large language models like ChatGPT in automating the appraisal of medical literature, thereby contributing to the identification of accurately reported research. 

Potential applications of ChatGPT include its integration within medical databases for abstract evaluation. However, it is crucial to acknowledge the current limitations, particularly the token limit, which confines its use to abstracts. As AI technology continues to advance, future iterations like GPT4 could offer more reliable and comprehensive evaluations. 

This, in turn, could significantly enhance the identification of high-quality research, potentially leading to improved patient outcomes.

The information and viewpoints presented in the above news piece or article do not necessarily reflect the official stance or policy of Dental Resource Asia or the DRA Journal. While we strive to ensure the accuracy of our content, Dental Resource Asia (DRA) or DRA Journal cannot guarantee the constant correctness, comprehensiveness, or timeliness of all the information contained within this website or journal.

Please be aware that all product details, product specifications, and data on this website or journal may be modified without prior notice in order to enhance reliability, functionality, design, or for other reasons.

The content contributed by our bloggers or authors represents their personal opinions and is not intended to defame or discredit any religion, ethnic group, club, organisation, company, individual, or any entity or individual.

Leave a Reply

Your email address will not be published. Required fields are marked *