City Research Online

Look-alike modelling in violence-related research: A missing data approach

Barbosa, E. C. ORCID: 0000-0001-8282-131X, Blom, N. ORCID: 0000-0003-0742-4554 & Bunce, A. ORCID: 0000-0002-7244-0561 (2025). Look-alike modelling in violence-related research: A missing data approach. PLoS ONE, 20(1), doi: 10.1371/journal.pone.0301155

Abstract

Violence has been analysed in silo due to difficulties in accessing data and concerns for the safety of those exposed. While there is some literature on violence and its associations using individual datasets, analyses using combined sources of data are very limited. Ideally data from the same individuals would enable linkage and a longitudinal understanding of experiences of violence and their (health) impacts and consequences. This paper aims to provide proof of concept to create a synthetic dataset by combining data from the Crime Survey for England and Wales (CSEW) and administrative data from Rape Crisis England and Wales (RCEW), pertaining to victim-survivors of sexual violence in adulthood. Intuitively, the idea was to impute missing information from one dataset by borrowing the distribution from the other. In our analyses, we borrowed information from CSEW to impute missing data in the RCEW administrative dataset, creating a combined synthetic RCEW-CSEW dataset. Using look-alike modelling principles, we provide an innovative and cost-effective approach to exploring patterns and associations in violence-related research in a multi-sectorial setting. Methodologically, we approached data integration as a missing data problem to create a synthetic combined dataset. Multiple imputation with chained equations were employed to collate/impute data from the two different sources. To test whether this procedure was effective, we compared regressions analyses for the individual and combined synthetic datasets on binary, continuous and categorical variables. We extended our testing to an outcome measure and, finally, applied the technique to a variable fully missing in one data source. Our results show that the effect sizes for the combined dataset reflect those from the dataset used for imputation. The variance is higher, resulting in fewer statistically significant estimates. Our approach reinforces the possibility of combining administrative with survey datasets using look-alike methods to overcome existing barriers to data linkage.

Publication Type: Article
Additional Information: Copyright: © 2025 Barbosa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Subjects: H Social Sciences > HM Sociology
R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine
Departments: School of Policy & Global Affairs
School of Policy & Global Affairs > Violence and Society Centre
SWORD Depositor:
[thumbnail of journal.pone.0301155.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution.

Download (549kB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login