Published: 2025

From Hallucinations to Hazards: Benchmarking LLMs for Hazard Analysis in Safety-Critical Systems

CATEGORIES

RISK-BASED PROCESS SAFETY ELEMENTS

Research Summary

This paper maps the current landscape of LLM benchmarks through a scoping review, evaluating their applicability to safety-critical hazard analysis. A pilot study across nine safety scenarios revealed significant inconsistency in LLM analytical quality between evaluation runs, with hazard identification scores varying and causal reasoning performance being consistently poor. This is essential reading for MOC practitioners considering AI tools for change-related hazard reviews. The findings highlight that while LLMs show promise for supporting MOC hazard screenings, their inconsistent analytical quality poses significant challenges for safety assurance. Organizations must implement robust validation frameworks before deploying LLMs in safety-critical MOC decision workflows.

AUTHORS

Not specified in search results

CITATIONS

"From hallucinations to hazards: Benchmarking LLMs for hazard analysis in safety-critical systems," Safety Sci., vol. 187, Nov. 2025.