Semantic Fidelity in Specialized Domains: Advancing Language Models through Adaptive Learning, Collective Reasoning, and Consensus Evaluation
Saadaoui, S. (2026). Semantic Fidelity in Specialized Domains: Advancing Language Models through Adaptive Learning, Collective Reasoning, and Consensus Evaluation. (Unpublished Doctoral thesis, City St George's, University of London)
Abstract
The effective deployment of language models in specialized domains such as finance and medicine requires addressing three coupled challenges: learning domain-specific representations, generating semantically faithful and comprehensive outputs, and evaluating quality without extensive human annotation. This dissertation addresses these challenges through the unifying concept of semantic fidelity, defined as the preservation of intended meaning and relations among domain concepts across (i) representation, (ii) generation, and (iii) evaluation. This work develops three complementary contributions spanning representation learning, multi-agent reasoning, and consensus-based evaluation.
First, Adaptive Masked Language Modeling (AMLM; Chapter 3) introduces a domain adaptation approach that dynamically prioritizes domain-specific terminology during pre-training through adaptive importance weighting. By incorporating multiple signals of token importance together with stabilization mechanisms, AMLM ensures robust training under highly skewed importance distributions. Evaluated on financial-domain tasks, AMLM demonstrates improvements in semantic textual similarity while producing more compact and semantically coherent representations. These results suggest that adaptive importance weighting provides an effective, architecture-agnostic path to domain specialization.
Second, Collective Intentional Reading through Reflection and Refinement (CIR3; Chapter 4) introduces a multi-agent framework for generating question–answer pairs that are both comprehensive and faithful to technical context. CIR3 applies collective intelligence principles through structured coordination that balances perspectival diversity with semantic alignment to the source material. Agents iteratively refine outputs through interaction protocols that prevent premature consensus. Experiments across financial and medical datasets demonstrate substantial improvements in both comprehensiveness and faithfulness over strong baselines, while reducing duplication and over-specificity.
Third, the consensus-based evaluation framework (Chapter 5) enables rigorous assessment without reliance on human-annotated gold standards. The framework establishes semantic consensus among multiple models and quantifies inter-model reliability through hierarchical clustering across multiple semantic granularities. Target systems are evaluated using agreement metrics that balance fine-grained and holistic semantic alignment. Cross-domain validation in finance and medicine demonstrates strong consensus reliability, robust system alignment, and stability across both general-purpose and domain-informed embeddings, suggesting that multi-model consensus offers a practical alternative to annotation-intensive evaluation.
Together, these contributions enhance semantic fidelity in specialized domains. AMLM learns domain-aware representations through weighted loss functions; CIR3 structures collective reasoning to produce faithful, comprehensive outputs; and the consensus framework provides a principled means of evaluation without gold standards. The components are modular and interoperable, supporting independent use or integration into a unified pipeline for high-stakes NLP applications where semantic precision is critical.
The findings demonstrate that (i) training-objective design can enable efficient domain specialization without architectural changes, (ii) collective intelligence mechanisms can balance diversity and convergence for reliable reasoning, and (iii) multi-model consensus with quantified reliability offers a practical alternative to annotation-intensive evaluation. Collectively, these results outline a coherent methodological framework for improving language model fidelity in specialized domains.
Download (1MB) | Preview
Export
Downloads
Downloads per month over past year
Metadata
Metadata