Open Access
Open access
volume 15 issue 1 publication number 11697

Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms

Weicheng Zhu 1
Long Chen 1
Yindalon Aphinyanaphongs 2
Fay Kastrinos 3
Diane M. Simeone 4
Mark Pochapin 5
Cody Stender 6
Narges Razavian 2
Tamas A Gonda 5
Publication typeJournal Article
Publication date2025-04-05
scimago Q1
wos Q1
SJR0.874
CiteScore6.7
Impact factor3.9
ISSN20452322
Abstract

Early detection of pancreatic cancer (PC) remains challenging largely due to the low population incidence and few known risk factors. However, screening in at-risk populations and detection of early cancer has the potential to significantly alter survival. In this study, we aim to develop a predictive model to identify patients at risk for developing new-onset PC at two and a half to three year time frame. We used the Electronic Health Records (EHR) of a large medical system from 2000 to 2021 (N = 537,410). The EHR data analyzed in this work consists of patients’ demographic information, diagnosis records, and lab values, which are used to identify patients who were diagnosed with pancreatic cancer and the risk factors used in the machine learning algorithm for prediction. We identified 73 risk factors of pancreatic cancer with the Phenome-wide Association Study (PheWAS) on a matched case–control cohort. Based on them, we built a large-scale machine learning algorithm based on EHR. A temporally stratified validation based on patients not included in any stage of the training of the model was performed. This model showed an AUROC at 0.742 [0.727, 0.757] which was similar in both the general population and in a subset of the population who has had prior cross-sectional imaging. The rate of diagnosis of pancreatic cancer in those in the top 1 percentile of the risk score was 6 folds higher than the general population. Our model leverages data extracted from a 6-month window of time in the electronic health record to identify patients at nearly sixfold higher than baseline risk of developing pancreatic cancer 2.5–3 years from evaluation. This approach offers an opportunity to define an enriched population entirely based on static data, where current screening may be recommended.

Found 
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
1
Share
Cite this
GOST |
Cite this
GOST Copy
Zhu W. et al. Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms // Scientific Reports. 2025. Vol. 15. No. 1. 11697
GOST all authors (up to 50) Copy
Zhu W., Chen L., Aphinyanaphongs Y., Kastrinos F., Simeone D., Pochapin M., Stender C., Razavian N., Gonda T. A. Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms // Scientific Reports. 2025. Vol. 15. No. 1. 11697
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1038/s41598-025-89607-8
UR - https://www.nature.com/articles/s41598-025-89607-8
TI - Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms
T2 - Scientific Reports
AU - Zhu, Weicheng
AU - Chen, Long
AU - Aphinyanaphongs, Yindalon
AU - Kastrinos, Fay
AU - Simeone, Diane M.
AU - Pochapin, Mark
AU - Stender, Cody
AU - Razavian, Narges
AU - Gonda, Tamas A
PY - 2025
DA - 2025/04/05
PB - Springer Nature
IS - 1
VL - 15
SN - 2045-2322
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Zhu,
author = {Weicheng Zhu and Long Chen and Yindalon Aphinyanaphongs and Fay Kastrinos and Diane M. Simeone and Mark Pochapin and Cody Stender and Narges Razavian and Tamas A Gonda},
title = {Identification of patients at risk for pancreatic cancer in a 3-year timeframe based on machine learning algorithms},
journal = {Scientific Reports},
year = {2025},
volume = {15},
publisher = {Springer Nature},
month = {apr},
url = {https://www.nature.com/articles/s41598-025-89607-8},
number = {1},
pages = {11697},
doi = {10.1038/s41598-025-89607-8}
}