In a national teleradiology setting, radiologists reported substantial false positive burden, limited perceived time savings, and strongly conditional trust in an FDA-cleared ICH AI detection tool.
Key Findings
Results
Only a small minority of radiologists found false-positive AI alerts to be infrequent enough to be acceptable.
18.5% (12/65; 95% CI 10.9%-29.6%) agreed that false-positive alerts were infrequent enough to be acceptable.
Free-text responses attributed false positives primarily to artifacts and calcifications.
65 total radiologists responded, including 23 (35.4%) neuroradiologists and 42 (64.6%) non-neuroradiologists.
The survey was conducted in a national teleradiology practice with access to an FDA-cleared ICH AI overlay during noncontrast head CT interpretation.
Results
Fewer than one-third of radiologists agreed that the AI correctly identified most intracranial hemorrhage cases.
Agreement that the AI correctly identified most ICH cases was 32.3% (21/65; 95% CI 22.2%-44.4%).
Agreement that the AI rarely missed clinically important hemorrhages was 43.1% (28/65; 95% CI 31.8%-55.2%).
These perceptions reflect real-world use rather than controlled benchmark testing conditions.
Results
Radiologist trust in AI output was highly conditional on whether the AI agreed with their own interpretation.
50.8% (33/65; 95% CI 38.9%-62.5%) reported trusting the AI when it agreed with their interpretation.
Only 3.1% (2/65; 95% CI 0.8%-10.5%) reported trusting the AI when it conflicted with their interpretation.
This asymmetry suggests that AI confirmation of a radiologist's own read is a primary driver of trust.
Results
The AI tool was not perceived to reduce overall interpretation time, and a substantial proportion of radiologists felt false-positive review time outweighed benefits.
Only 10.8% (7/65; 95% CI 5.3%-20.6%) reported reduced overall interpretation time.
33.8% (22/65; 95% CI 23.5%-46.0%) agreed that time spent reviewing false-positive alerts outweighed the benefits.
This item (false-positive burden outweighing benefits) was the prespecified primary endpoint of the study.
Free-text responses also noted delayed or inconsistent AI availability as a workflow concern.
Results
Self-reported reduced scrutiny following an AI-negative result was uncommon but present in a small minority of radiologists.
6.2% (4/65; 95% CI 2.4%-14.8%) reported reduced scrutiny after an AI-negative result.
While uncommon, the authors note this behavior was not entirely absent in routine practice.
This finding raises potential patient safety considerations regarding automation bias.
Results
Neuroradiologists more frequently agreed that false-positive alert review time outweighed benefits compared to non-neuroradiologists, though this difference did not survive multiple comparisons correction.
52.2% (12/23) of neuroradiologists vs. 23.8% (10/42) of non-neuroradiologists endorsed the primary endpoint (unadjusted P=.03).
No exploratory subgroup differences remained statistically significant after false discovery rate correction using the Benjamini-Hochberg procedure.
Subgroup comparisons other than the primary endpoint were treated as exploratory.
Results
Radiologists expressed medicolegal concerns and consultation burden as additional practical challenges related to AI use.
These themes emerged from free-text survey responses.
Concerns included medicolegal uncertainty about how to act when AI output conflicted with radiologist interpretation.
Consultation burden was also identified, suggesting AI disagreements prompted additional colleague consultations.
What This Means
This research suggests that when radiologists in a large national teleradiology network actually use an AI tool designed to detect bleeding in the brain (intracranial hemorrhage) during their daily work, they frequently find it frustrating and of limited practical benefit. Despite the tool having FDA clearance, fewer than one in five radiologists found its false alarms (flagging scans that don't actually show bleeding) to be at an acceptable level, and only about one-third believed it reliably caught most true cases of bleeding. Critically, radiologists' trust in the AI was almost entirely dependent on whether it agreed with their own read — nearly half trusted it when it confirmed their view, but almost nobody trusted it when it contradicted them.
The tool also did not appear to save time in practice: only about 1 in 10 radiologists reported faster interpretation, while roughly 1 in 3 felt that the time spent investigating false alarms actually cost them more than the tool saved. A small but notable minority (about 6%) reported being less thorough in reviewing scans when the AI flagged no hemorrhage, which raises questions about whether overreliance on AI could occasionally lead to missed diagnoses. Neuroradiologists — specialists in brain imaging — were more likely than general radiologists to feel the false-alarm burden outweighed the tool's benefits, though this difference was not statistically robust after accounting for multiple comparisons.
This research suggests that simply obtaining regulatory clearance for an AI radiology tool is not sufficient to ensure it will work well in real clinical environments. Factors like how often the AI produces false alarms, whether it is consistently available, how fast it delivers results, and how well it fits into existing workflows all matter enormously for whether radiologists will find it useful and trustworthy. The findings highlight the need for AI developers and hospital systems to pay close attention to specificity (minimizing false alarms), system reliability, and thoughtful integration into clinical practice when deploying these tools.
Del Gaizo A, Del Gaizo J, Shahoumian T. (2026). Radiologist Perceptions of an AI Tool for Intracranial Hemorrhage Detection in Teleradiology: Cross-Sectional Survey Study.. JMIR human factors. https://doi.org/10.2196/92145