Radiology AI Lab: Evaluation of Radiology Applications with Clinical End-Users
Despite the approval of over 200 artificial intelligence (AI) applications for radiology in the European Union, widespread adoption in clinical practice remains limited. Current assessments of AI applications often rely on post-hoc evaluations, lacking the granularity to capture real-time radiologist-AI interactions. The purpose of the study is to realise the Radiology AI lab for real-time, objective measurement of the impact of AI applications on radiologists’ workflows. We proposed the user-state sensing framework (USSF) to structure the sensing of radiologist-AI interactions in terms of personal, interactional, and contextual states. Guided by the USSF, a lab was established using three non-invasive biometric measurement techniques: eye-tracking, heart rate monitoring, and facial expression analysis. We conducted a pilot test with four radiologists of varying experience levels, who read ultra-low-dose (ULD) CT cases in (1) standard PACS and (2) manually annotated (to mimic AI) PACS workflows. Interpretation time, eye-tracking metrics, heart rate variability (HRV), and facial expressions were recorded and analysed. The Radiology AI lab was successfully realised as an initial physical iteration of the USSF at a tertiary referral centre. Radiologists participating in the pilot test read 32 ULDCT cases (mean age, 52 years ± 23 (SD); 17 male; 16 cases with abnormalities). Cases were read on average in 4.1 ± 2.2 min (standard PACS) and 3.9 ± 1.9 min (AI-annotated PACS), with no significant difference (p = 0.48). Three out of four radiologists showed significant shifts (p < 0.02) in eye-tracking metrics, including saccade duration, saccade quantity, fixation duration, fixation quantity, and pupil diameter, when using the AI-annotated workflow. These changes align with prior findings linking such metrics to increased competency and reduced cognitive load, suggesting a more efficient visual search strategy in AI-assisted interpretation. Although HRV metrics did not correlate with experience, when combined with facial expression analysis, they helped identify key moments during the pilot test. The Radiology AI lab was successfully realised, implementing personal, interactional, and contextual states of the user-state sensing framework, enabling objective analysis of radiologists’ workflows, and effectively capturing relevant biometrics. Future work will focus on expanding sensing of the contextual state of the user-state sensing framework, refining baseline determination, and continuing investigation of AI-enabled tools in radiology workflows.