Open Access
Open access
volume 14 issue 2 pages 572

Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology

Miguel Mascarenhas Saraiva 1, 2, 3
Tiago Ribeiro 1, 2, 3
Belén Agudo 4
João Afonso 1, 2, 3
Francisco Mendes 1, 2, 3
Miguel Martins 1, 2, 3
Pedro Cardoso 1, 2, 3
Joana Mota 1, 2, 3
Maria Joao Almeida 1, 2, 3
António Costa 4
Mariano Gonzalez Haba 4
Jessica Widmer 5
Eduardo Guimarães Hourneaux de Moura 6
Ahsan Javed 7
Thiago Da Silveira Manzione 8
Sidney Nadal 8
Luis F. Barroso 9
V De Parades 10
João Ferreira 11
Guilherme Macedo 1, 2, 3
Publication typeJournal Article
Publication date2025-01-17
scimago Q1
wos Q1
SJR0.919
CiteScore5.2
Impact factor2.9
ISSN20770383
Abstract

Background: Several artificial intelligence systems based on large language models (LLMs) have been commercially developed, with recent interest in integrating them for clinical questions. Recent versions now include image analysis capacity, but their performance in gastroenterology remains untested. This study assesses ChatGPT-4’s performance in interpreting gastroenterology images. Methods: A total of 740 images from five procedures—capsule endoscopy (CE), device-assisted enteroscopy (DAE), endoscopic ultrasound (EUS), digital single-operator cholangioscopy (DSOC), and high-resolution anoscopy (HRA)—were included and analyzed by ChatGPT-4 using a predefined prompt for each. ChatGPT-4 predictions were compared to gold standard diagnoses. Statistical analyses included accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC). Results: For CE, ChatGPT-4 demonstrated accuracies ranging from 50.0% to 90.0%, with AUCs of 0.50–0.90. For DAE, the model demonstrated an accuracy of 67.0% (AUC 0.670). For EUS, the system showed AUCs of 0.488 and 0.550 for the differentiation between pancreatic cystic and solid lesions, respectively. The LLM differentiated benign from malignant biliary strictures with an AUC of 0.550. For HRA, ChatGPT-4 showed an overall accuracy between 47.5% and 67.5%. Conclusions: ChatGPT-4 demonstrated suboptimal diagnostic accuracies for image interpretation across several gastroenterology techniques, highlighting the need for continuous improvement before clinical adoption.

Found 
Found 

Top-30

Journals

1
Journal of Medical Internet Research
1 publication, 25%
Artificial Intelligence Surgery
1 publication, 25%
Healthcare
1 publication, 25%
World Journal of Gastrointestinal Oncology
1 publication, 25%
1

Publishers

1
JMIR Publications
1 publication, 25%
OAE Publishing Inc.
1 publication, 25%
MDPI
1 publication, 25%
Baishideng Publishing Group
1 publication, 25%
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
5
Share
Cite this
GOST |
Cite this
GOST Copy
Mascarenhas Saraiva M. et al. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology // Journal of Clinical Medicine. 2025. Vol. 14. No. 2. p. 572.
GOST all authors (up to 50) Copy
Mascarenhas Saraiva M., Ribeiro T., Agudo B., Afonso J., Mendes F., Martins M., Cardoso P., Mota J., Almeida M. J., Costa A., Gonzalez Haba M., Widmer J., Moura E. G. H. D., Javed A., Manzione T. D. S., Nadal S., Barroso L. F., De Parades V., Ferreira J., Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology // Journal of Clinical Medicine. 2025. Vol. 14. No. 2. p. 572.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.3390/jcm14020572
UR - https://www.mdpi.com/2077-0383/14/2/572
TI - Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology
T2 - Journal of Clinical Medicine
AU - Mascarenhas Saraiva, Miguel
AU - Ribeiro, Tiago
AU - Agudo, Belén
AU - Afonso, João
AU - Mendes, Francisco
AU - Martins, Miguel
AU - Cardoso, Pedro
AU - Mota, Joana
AU - Almeida, Maria Joao
AU - Costa, António
AU - Gonzalez Haba, Mariano
AU - Widmer, Jessica
AU - Moura, Eduardo Guimarães Hourneaux de
AU - Javed, Ahsan
AU - Manzione, Thiago Da Silveira
AU - Nadal, Sidney
AU - Barroso, Luis F.
AU - De Parades, V
AU - Ferreira, João
AU - Macedo, Guilherme
PY - 2025
DA - 2025/01/17
PB - MDPI
SP - 572
IS - 2
VL - 14
SN - 2077-0383
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Mascarenhas Saraiva,
author = {Miguel Mascarenhas Saraiva and Tiago Ribeiro and Belén Agudo and João Afonso and Francisco Mendes and Miguel Martins and Pedro Cardoso and Joana Mota and Maria Joao Almeida and António Costa and Mariano Gonzalez Haba and Jessica Widmer and Eduardo Guimarães Hourneaux de Moura and Ahsan Javed and Thiago Da Silveira Manzione and Sidney Nadal and Luis F. Barroso and V De Parades and João Ferreira and Guilherme Macedo},
title = {Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology},
journal = {Journal of Clinical Medicine},
year = {2025},
volume = {14},
publisher = {MDPI},
month = {jan},
url = {https://www.mdpi.com/2077-0383/14/2/572},
number = {2},
pages = {572},
doi = {10.3390/jcm14020572}
}
MLA
Cite this
MLA Copy
Mascarenhas Saraiva, Miguel, et al. “Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology.” Journal of Clinical Medicine, vol. 14, no. 2, Jan. 2025, p. 572. https://www.mdpi.com/2077-0383/14/2/572.