City Research Online

VIGIL-AI: Real-Time Cyber Threat Entity Recognition from OSINT Using Hybrid Regex–Transformer Models

Mohapatra, R., Saedi, M. ORCID: 0000-0001-6436-1057, Mirtaheri, S. L. & Fehmi, J. (2025). VIGIL-AI: Real-Time Cyber Threat Entity Recognition from OSINT Using Hybrid Regex–Transformer Models. Paper presented at the International Conference on Emerging Trends in Cybersecurity (ICETCS 2025), 27-28 Oct 2025, Wolverhampton, UK.

Abstract

Cybersecurity analysts increasingly rely on unstructured threat intelligence from social media, blogs, and open-source feeds, yet existing tools struggle to extract reliable indicators from such noisy sources. Rule-based methods achieve high precision on structured identifiers but miss contextual and obfuscated entities, while transformer-based language models provide contextual understanding but face adaptation challenges in specialized cybersecurity text. This paper introduces VIGIL-AI, a hybrid entity recognition pipeline for real-time cyber threat intelligence extraction from open-source intelligence (OSINT) sources. Real-time solutions such as VIGIL-AI are increasingly important in cybersecurity, where timely identification of threats can significantly reduce risk. A manually annotated dataset of 500 samples from Reddit, Twitter, and News-API, combined with the publicly available bnsapa/cybersecurity-ner corpus, provides realistic training and evaluation data under noisy conditions. The system focuses on five key entity types relevant to threat intelligence: malware, vulnerabilities, indicators, systems, and organizations. Experimental results show that the hybrid approach improves coverage across both structured and informal mentions. DeBERTa-v3-large achieved the best balance of precision (0.678), recall (0.650), and F1 score (0.663), while the rule-based component ensured high accuracy on structured indicators. A proof-of-concept deployment, VIGIL-AI, demonstrates real-time extraction from live feeds, validating the system’s operational relevance. These findings indicate that hybrid methods, supported by realistic datasets and practical deployments, can strengthen automated threat intelligence workflows and help bridge the gap between academic evaluation and real-world cybersecurity applications.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record will be available online at: https://link.springer.com/series/7818
Publisher Keywords: VIGIL-AI, Cyber threat intelligence, Named entity recognition, Open-source intelligence (OSINT), Hybrid regex–transformer models, Deep learning for cybersecurity, Real-time threat detection
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology
School of Science & Technology > Department of Computer Science
SWORD Depositor:
[thumbnail of Advanced_Cyber_Threat_Entity_Recognition_Using_Regex_and_Transformer_Based_Models (1).pdf] Text - Accepted Version
This document is not freely accessible due to copyright restrictions.

To request a copy, please use the button below.

Request a copy

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login