City Research Online

Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage

Schnell, R. (2017). Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage. Paper presented at the 16th IEEE International Conference on Data Mining Workshop, ICDMW 2016, 12-15 Dec 2016, Barcelona, Spain.


In most European settings, record linkage across different institutions is based on encrypted personal identifiers – such as names, birthdays, or places of birth – to protect privacy. However, in practice up to 20% of the records may contain errors in identifiers. Thus, exact record linkage on encrypted identifiers usually results in the loss of large subsets of the data. Such losses usually imply biased statistical estimates since the causes of errors might be correlated with the variables of interest in many applications. Over the past 10 years, the field of Privacy Preserving Record Linkage (PPRL) has developed different techniques to link data without revealing the identity of the described entity. However, only few techniques are suitable for applied research with large data bases that include millions of records, which is typical for administrative or medical data bases. Bloom filters were found to be one successful technique for PPRL when large scale applications are concerned. Yet, Bloom filters have been subject to cryptographic attacks. Previous research has shown that the straight application of Bloom filters has a non-zero re-identification risk. We present new results on recently developed techniques defying all known attacks on PPRL Bloom filters. The computationally inexpensive algorithms modify personal identifiers by combining different cryptographic techniques. The paper demonstrates these new algorithms and demonstrates their performance concerning pprecision, recall, and re-identification risk on large data bases.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects: H Social Sciences
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Policy & Global Affairs > Sociology & Criminology
[thumbnail of Icdm2016_Authorapprovedversion.pdf]
Text - Accepted Version
Download (308kB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login