When methods matter: how implementation choices shape topic discovery in financial text
Gad, M., Park, G.
ORCID: 0000-0002-1009-7462, Rawsthorne, S. & Young, S. (2026).
When methods matter: how implementation choices shape topic discovery in financial text.
Accounting and Business Research,
doi: 10.1080/00014788.2026.2625716
Abstract
This paper examines the application of LDA topic modelling to risk disclosures in FTSE350 firms’ annual reports. We show that LDA implementation choices significantly impact topic representations and subsequent inferences. Using a corpus of FTSE350 annual reports, we show that preprocessing decisions, multiword expressions and labelling strategies materially affect topic interpretability and granularity. Our analysis reveals that while risk reporting addresses key business risks at an aggregate level, the degree of firm-specific commentary is sensitive to topic granularity. Hierarchical linear modelling suggests that 27% of topic variation is within firms for broad topics, increasing to 75% for granular topics. We leverage GPT to enhance topic labelling, showcasing the potential of LLMs in financial text analysis. We also compare LDA to modern embedding-based topic models, finding that while they often generate more coherent topics, they introduce a new set of critical implementation choices and do not eliminate the need for researcher discretion. These findings challenge the claims of LDA objectivity and highlight the importance of domain expertise. We propose a practical checklist for LDA implementation in accounting and finance research emphasising transparency and robustness checks.
| Publication Type: | Article |
|---|---|
| Additional Information: | © 2026 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent. |
| Publisher Keywords: | textual analysis, topic modelling, risk disclosure, annual reports, Latent Dirichlet Allocation, GPT |
| Subjects: | H Social Sciences > HG Finance |
| Departments: | Bayes Business School Bayes Business School > Faculty of Finance |
| SWORD Depositor: |
Available under License Creative Commons Attribution.
Download (13MB) | Preview
Export
Downloads
Downloads per month over past year
Metadata
Metadata