City Research Online

Improving police recorded crime data for domestic violence and abuse through natural language processing

Cook, D. ORCID: 0000-0002-6810-0281, Weir, R. ORCID: 0000-0002-5554-801X & Humphreys, L. (2025). Improving police recorded crime data for domestic violence and abuse through natural language processing. Frontiers in Sociology, 10, article number 1686632. doi: 10.3389/fsoc.2025.1686632

Abstract

Introduction
Domestic Violence and Abuse (DVA) is a growing public health and safeguarding concern in the UK, compounded by long-standing data quality issues in police records. Incomplete or inaccurate recording of key variables undermines the ability of police, health services, and partner agencies to assess risk, allocate resources, and design effective interventions.

Methods
We evaluated two machine learning models (Random Forest and DistilBERT) for classifying the type of victim/offender relationship (ex-partner, current partner, and family) from approximately 19,000 DVA incidents recorded by a UK police force. Models were benchmarked against a static rule-based classifier and assessed using precision, recall, and F1-score. To reduce false positives in the most challenging relationship categories, we implemented a selective classification strategy that abstained from low-confidence predictions.

Results
Both machine learning models outperformed the baseline across all metrics, with average absolute gains of 11% in precision and 16% in recall. Ex-partner cases were classified most accurately, while current partner cases were classified with the least accuracy. Selective classification substantially improved precision for underperforming categories, albeit at the expense of reduced coverage.

Discussion
These findings demonstrate that computational tools can enhance the completeness and reliability of police DVA data, provided their use balances predictive accuracy, interpretability, and safeguarding risks.

Publication Type: Article
Additional Information: Copyright © 2025 Cook, Weir and Humphreys. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Publisher Keywords: natural language processing, police recorded crime, domestic violence (DV), text classication, supervised machine learning, DistilBERT, free text
Subjects: H Social Sciences > H Social Sciences (General)
Departments: School of Policy & Global Affairs
School of Policy & Global Affairs > Violence and Society Centre
SWORD Depositor:
[thumbnail of fsoc-10-1686632.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login