Semantic Fidelity in Specialized Domains: Advancing Language Models through Adaptive Learning, Collective Reasoning, and Consensus Evaluation

Saadaoui, S.

Semantic Fidelity in Specialized Domains: Advancing Language Models through Adaptive Learning, Collective Reasoning, and Consensus Evaluation

Saadaoui, S. (2026). Semantic Fidelity in Specialized Domains: Advancing Language Models through Adaptive Learning, Collective Reasoning, and Consensus Evaluation. (Unpublished Doctoral thesis, City St George's, University of London)

Abstract

The effective deployment of language models in specialized domains such as finance and medicine requires addressing three coupled challenges: learning domain-specific representations, generating semantically faithful and comprehensive outputs, and evaluating quality without extensive human annotation. This dissertation addresses these challenges through the unifying concept of semantic fidelity, defined as the preservation of intended meaning and relations among domain concepts across (i) representation, (ii) generation, and (iii) evaluation. This work develops three complementary contributions spanning representation learning, multi-agent reasoning, and consensus-based evaluation.

First, Adaptive Masked Language Modeling (AMLM; Chapter 3) introduces a domain adaptation approach that dynamically prioritizes domain-specific terminology during pre-training through adaptive importance weighting. By incorporating multiple signals of token importance together with stabilization mechanisms, AMLM ensures robust training under highly skewed importance distributions. Evaluated on financial-domain tasks, AMLM demonstrates improvements in semantic textual similarity while producing more compact and semantically coherent representations. These results suggest that adaptive importance weighting provides an effective, architecture-agnostic path to domain specialization.

Second, Collective Intentional Reading through Reflection and Refinement (CIR3; Chapter 4) introduces a multi-agent framework for generating question–answer pairs that are both comprehensive and faithful to technical context. CIR3 applies collective intelligence principles through structured coordination that balances perspectival diversity with semantic alignment to the source material. Agents iteratively refine outputs through interaction protocols that prevent premature consensus. Experiments across financial and medical datasets demonstrate substantial improvements in both comprehensiveness and faithfulness over strong baselines, while reducing duplication and over-specificity.

Third, the consensus-based evaluation framework (Chapter 5) enables rigorous assessment without reliance on human-annotated gold standards. The framework establishes semantic consensus among multiple models and quantifies inter-model reliability through hierarchical clustering across multiple semantic granularities. Target systems are evaluated using agreement metrics that balance fine-grained and holistic semantic alignment. Cross-domain validation in finance and medicine demonstrates strong consensus reliability, robust system alignment, and stability across both general-purpose and domain-informed embeddings, suggesting that multi-model consensus offers a practical alternative to annotation-intensive evaluation.

Together, these contributions enhance semantic fidelity in specialized domains. AMLM learns domain-aware representations through weighted loss functions; CIR3 structures collective reasoning to produce faithful, comprehensive outputs; and the consensus framework provides a principled means of evaluation without gold standards. The components are modular and interoperable, supporting independent use or integration into a unified pipeline for high-stakes NLP applications where semantic precision is critical.

The findings demonstrate that (i) training-objective design can enable efficient domain specialization without architectural changes, (ii) collective intelligence mechanisms can balance diversity and convergence for reliable reasoning, and (iii) multi-model consensus with quantified reliability offers a practical alternative to annotation-intensive evaluation. Collectively, these results outline a coherent methodological framework for improving language model fidelity in specialized domains.

Publication Type:	Thesis (Doctoral)
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Departments:	School of Science & Technology > Department of Computer Science School of Science & Technology > School of Science & Technology Doctoral Theses Doctoral Theses

Preview

Text - Accepted Version
Download (1MB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Saadaoui, S.
Status:	Unpublished
URI:	https://openaccess.city.ac.uk/id/eprint/37220
Date available in CRO:	01 Apr 2026 08:51
Date deposited:	1 April 2026
Dates:	Date Event March 2026 Completed