City Research Online

A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models

Aghazadeh-Chakherloua, R., Guo, Q., Khastgira, S. , Popov, P. ORCID: 0000-0002-3434-5272, Zhange, X. & Zhao, X. (2026). A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models. Reliability Engineering & System Safety, 272, article number 112615. doi: 10.1016/j.ress.2026.112615

Abstract

Large Language Models (LLMs) are increasingly deployed across diverse domains, raising the need for rigorous reliability assessment methods. Existing benchmark-based evaluations primarily offer descriptive statistics of model accuracy over datasets, providing limited insight into the probabilistic behavior of LLMs under real operational conditions. This paper introduces HIP-LLM, a Hierarchical Imprecise Probability framework for modeling and inferring LLM reliability. Building upon the foundations of software reliability engineering, HIP-LLM defines LLM reliability as the probability of failure-free operation over a specified number of future tasks under a given Operational Profile (OP). HIP-LLM represents dependencies across (sub-)domains hierarchically, enabling multi-level inference from subdomain to system-level reliability. HIP-LLM embeds imprecise priors to capture epistemic uncertainty and incorporates OPs to reflect usage contexts. It derives posterior reliability envelopes that quantify uncertainty across priors and data. Experiments on multiple benchmark datasets demonstrate that HIP-LLM offers a more nuanced and standardized reliability characterization than existing benchmark and state-of-the-art approaches. A publicly accessible repository of HIP-LLM is provided.

Publication Type: Article
Additional Information: To be published by Elsevier. © 2026. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
Publisher Keywords: Large Language Model; Software Reliability; Hierarchical Bayesian Inference; Operational Profile; Epistemic Uncertainty; Imprecise Probability
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology
School of Science & Technology > Department of Computer Science
SWORD Depositor:
[thumbnail of LLM_Reliability-25.pdf] Text - Accepted Version
This document is not freely accessible due to copyright restrictions.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

To request a copy, please use the button below.

Request a copy

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login