City Research Online

Detecting Malicious Web Scraping Activity: a Study with Diverse Detectors

Marques, P., Dabbabi, Z., Mironescu, M-M., Thonnard, O., Bessani, A., Buontempo, F. and Gashi, I. ORCID: 0000-0002-8017-3184 (2018). Detecting Malicious Web Scraping Activity: a Study with Diverse Detectors. Paper presented at the The 23rd IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2018), 4-7 Dec 2018, Taipei, Taiwan.

Abstract

We present results on the use of diverse monitoring tools for the detection of malicious web scraping activity. We have carried out an analysis of a real dataset of Apache HTTP Access logs for an e-commerce application provided by a large multinational IT provider for the global travel and tourism industry. Two tools have been used to detect scraping activities based on the HTTP requests: a commercial tool, and an in-house tool called Arcane. We show the benefits that can be achieved through the use of both systems, in terms of overall sensitivity and specificity, and we discuss the potential sources of diversity between the tool’s alert patterns.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords: design diversity, malicious web scraping, botnet detection, security assessment
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Mathematics, Computer Science & Engineering > Computer Science
URI: http://openaccess.city.ac.uk/id/eprint/20597
[img]
Preview
Text - Accepted Version
Download (688kB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login