City Research Online

Computational text analysis on unstructured police data: a scoping review

Lukmanjaya, W., Halmich, C., Butler, T. , Cook, D. ORCID: 0000-0002-6810-0281 & Karystianis, G. (2026). Computational text analysis on unstructured police data: a scoping review. Crime Science, 15(1), article number 6. doi: 10.1186/s40163-026-00272-2

Abstract

Introduction: Police reports made following attendance at various events (e.g., crashes, domestic violence, theft) often contain rich contextual details including indicators of mental health issues or abuse types, and persons/entities involved and their relationships, which are not typically captured in structured administrative data, interviews or official statistics. However, the sheer volume of information along with strict data access protocols render manual analysis impractical. Computational text analysis methods offer a feasible and effective approach to automatically process this underutilized data source.

Aim: This article is an overview of studies using computational text analysis (e.g., text mining, natural language processing (NLP)), on unstructured police data, serving as a guide for researchers interested in employing similar methodologies.

Methods: This scoping review was conducted in accordance with the PRISMA-SCR guidelines, following the two screening processes (title/abstract and full text screening) and the development of a pre-defined protocol. A search was conducted across seven electronic databases (ProQuest, IEEE Xplore, Scopus, PubMed, Web of Science, Criminal Justice Abstracts, Google Scholar) covering the past 20 years.

Results: A total of 5426 records were identified. After removing duplicate entries and screening titles/abstracts and full-text publications, 61 studies met the inclusion criteria. Included studies were published between 2004 and 2024, with most from the United States, Australia and the Netherlands. Most studies used opensource tools: Bidirectional Encoder Representations from Transformers (BERT), natural language tool kit (NLTK), scikit-learn, or General Architecture for Text Engineering (GATE) to analyze unstructured police data. Our review indicates applications of computational text analysis on unstructured police data have moderate to high performance. Common limitations included variable data quality, with reliability depending on the level of detail provided by the police report’s author, and failure to report ethical implications or methodological limitations.

Conclusions: Computational text analysis can extract key information from unstructured police data. However, future research should clearly report ethics approvals and implications, and methodological limitations. Establishing a structured data-sharing framework between law enforcement and researchers is also crucial to facilitate access and support high quality, impactful research in this field.

Publication Type: Article
Additional Information: © The Author(s) 2026. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher Keywords: Automated text analysis, Text mining, Natural language processing, Machine learning, Police data, Unstructured data
Subjects: H Social Sciences > H Social Sciences (General)
H Social Sciences > HA Statistics
Departments: School of Policy & Global Affairs
School of Policy & Global Affairs > Violence and Society Centre
SWORD Depositor:
[thumbnail of s40163-026-00272-2.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login