44 research outputs found
Intelligence gathering by capturing the social processes within prisons
We present a prototype system that can be used to capture longitudinal
socialising processes by recording people's encounters in space. We argue that
such a system can usefully be deployed in prisons and other detention
facilities in order help intelligence analysts assess the behaviour or
terrorist and organised crime groups, and their potential relationships. Here
we present the results of a longitudinal study, carried out with civilians,
which demonstrates the capabilities of our system.Comment: 21 pages, 7 Figures, 1 tabl
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs)
Machine learning (ML) is crucial in network anomaly detection for proactive
threat hunting, reducing detection and response times significantly. However,
challenges in model training, maintenance, and frequent false positives impact
its acceptance and reliability. Explainable AI (XAI) attempts to mitigate these
issues, allowing cybersecurity teams to assess AI-generated alerts with
confidence, but has seen limited acceptance from incident responders. Large
Language Models (LLMs) present a solution through discerning patterns in
extensive information and adapting to different functional requirements. We
present HuntGPT, a specialized intrusion detection dashboard applying a Random
Forest classifier using the KDD99 dataset, integrating XAI frameworks like SHAP
and Lime for user-friendly and intuitive model interaction, and combined with a
GPT-3.5 Turbo, it delivers threats in an understandable format. The paper
delves into the system's architecture, components, and technical accuracy,
assessed through Certified Information Security Manager (CISM) Practice Exams,
evaluating response quality across six metrics. The results demonstrate that
conversational agents, supported by LLM and integrated with XAI, provide
robust, explainable, and actionable AI solutions in intrusion detection,
enhancing user understanding and interactive experience
CYGENT: A cybersecurity conversational agent with log summarization powered by GPT-3
In response to the escalating cyber-attacks in the modern IT and IoT
landscape, we developed CYGENT, a conversational agent framework powered by
GPT-3.5 turbo model, designed to aid system administrators in ensuring optimal
performance and uninterrupted resource availability. This study focuses on
fine-tuning GPT-3 models for cybersecurity tasks, including conversational AI
and generative AI tailored specifically for cybersecurity operations. CYGENT
assists users by providing cybersecurity information, analyzing and summarizing
uploaded log files, detecting specific events, and delivering essential
instructions. The conversational agent was developed based on the GPT-3.5 turbo
model. We fine-tuned and validated summarizer models (GPT3) using manually
generated data points. Using this approach, we achieved a BERTscore of over
97%, indicating GPT-3's enhanced capability in summarizing log files into
human-readable formats and providing necessary information to users.
Furthermore, we conducted a comparative analysis of GPT-3 models with other
Large Language Models (LLMs), including CodeT5-small, CodeT5-base, and
CodeT5-base-multi-sum, with the objective of analyzing log analysis techniques.
Our analysis consistently demonstrated that Davinci (GPT-3) model outperformed
all other LLMs, showcasing higher performance. These findings are crucial for
improving human comprehension of logs, particularly in light of the increasing
numbers of IoT devices. Additionally, our research suggests that the
CodeT5-base-multi-sum model exhibits comparable performance to Davinci to some
extent in summarizing logs, indicating its potential as an offline model for
this task.Comment: 7 pages, 9 figure
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4
In an era where cyberspace is both a battleground and a backbone of modern
society, the urgency of safeguarding digital assets against ever-evolving
threats is paramount. This paper introduces Cyber Sentinel, an innovative
task-oriented cybersecurity dialogue system that is effectively capable of
managing two core functions: explaining potential cyber threats within an
organization to the user, and taking proactive/reactive security actions when
instructed by the user. Cyber Sentinel embodies the fusion of artificial
intelligence, cybersecurity domain expertise, and real-time data analysis to
combat the multifaceted challenges posed by cyber adversaries. This article
delves into the process of creating such a system and how it can interact with
other components typically found in cybersecurity organizations. Our work is a
novel approach to task-oriented dialogue systems, leveraging the power of
chaining GPT-4 models combined with prompt engineering across all sub-tasks. We
also highlight its pivotal role in enhancing cybersecurity communication and
interaction, concluding that not only does this framework enhance the system's
transparency (Explainable AI) but also streamlines the decision-making process
and responding to threats (Actionable AI), therefore marking a significant
advancement in the realm of cybersecurity communication
TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild
The extraction of cyber threat intelligence (CTI) from open sources is a
rapidly expanding defensive strategy that enhances the resilience of both
Information Technology (IT) and Operational Technology (OT) environments
against large-scale cyber-attacks. While previous research has focused on
improving individual components of the extraction process, the community lacks
open-source platforms for deploying streaming CTI data pipelines in the wild.
To address this gap, the study describes the implementation of an efficient and
well-performing platform capable of processing compute-intensive data pipelines
based on the cloud computing paradigm for real-time detection, collecting, and
sharing CTI from different online sources. We developed a prototype platform
(TSTEM), a containerized microservice architecture that uses Tweepy, Scrapy,
Terraform, ELK, Kafka, and MLOps to autonomously search, extract, and index
IOCs in the wild. Moreover, the provisioning, monitoring, and management of the
TSTEM platform are achieved through infrastructure as a code (IaC). Custom
focus crawlers collect web content, which is then processed by a first-level
classifier to identify potential indicators of compromise (IOCs). If deemed
relevant, the content advances to a second level of extraction for further
examination. Throughout this process, state-of-the-art NLP models are utilized
for classification and entity extraction, enhancing the overall IOC extraction
methodology. Our experimental results indicate that these models exhibit high
accuracy (exceeding 98%) in the classification and extraction tasks, achieving
this performance within a time frame of less than a minute. The effectiveness
of our system can be attributed to a finely-tuned IOC extraction method that
operates at multiple stages, ensuring precise identification of relevant
information with low false positives
Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration
Future AI applications require performance, reliability and privacy that the
existing, cloud-dependant system architectures cannot provide. In this article,
we study orchestration in the device-edge-cloud continuum, and focus on AI for
edge, that is, the AI methods used in resource orchestration. We claim that to
support the constantly growing requirements of intelligent applications in the
device-edge-cloud computing continuum, resource orchestration needs to embrace
edge AI and emphasize local autonomy and intelligence. To justify the claim, we
provide a general definition for continuum orchestration, and look at how
current and emerging orchestration paradigms are suitable for the computing
continuum. We describe certain major emerging research themes that may affect
future orchestration, and provide an early vision of an orchestration paradigm
that embraces those research themes. Finally, we survey current key edge AI
methods and look at how they may contribute into fulfilling the vision of
future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures
and new section
Strings and things:a semantic search engine for news quotes using named entity recognition
Abstract
Emerging methods for content delivery such as quote-searching and entity-searching, enable users to quickly identify novel and relevant information from unstructured texts, news articles, and media sources. These methods have widespread applications in web surveillance and crime informatics, and can help improve intention disambiguation, character evaluation, threat analysis, and bias detection. Furthermore, quote-based and entity-based searching is also an empowering information retrieval tool that can enable non-technical users to gauge the quality of public discourse, allowing for more fine-grained analysis of core sociological questions. The paper presents a prototype search engine that allows users to search a news database containing quotes using a combination of strings and things. The ingestion pipeline, which forms the backend of the service, comprises of the following modules i) a crawler that ingests data from the GDELT Global Quotation Graph ii) a named entity recognition (NER) filter that labels data on the fly iii) an indexing mechanism that serves the data to an Elasticsearch cluster and iv) a user interface that allows users to formulate queries. The paper presents the high-level configuration of the pipeline and reports basic metrics and aggregations
Public perceptions on organised crime, mafia, and terrorism:a big data analysis based on Twitter and Google trends
Abstract
Public perceptions enable crime and motivate government policy on law and order; however, there has been limited empirical research on serious crime perceptions in social media. Recently, open source data—and ‘big data’—have enabled researchers from different fields to develop cost-effective methods for opinion mining and sentiment analysis. Against this backdrop, the aim of this paper is to apply state-of-the-art tools and techniques for assembly and analysis of open source data. We set out to explore how non-discursive behavioural data can be used as a proxy for studying public perceptions of serious crime. The data collection focused on the following three conversational topics: organised crime, the mafia, and terrorism. Specifically, time series data of users’ online search habits (over a ten-year period) were gathered from Google Trends, and cross-sectional network data (N=178,513) were collected from Twitter. The collected data contained a significant amount of structure. Marked similarities and differences in people’s habits and perceptions were observable, and these were recorded. The results indicated that ‘big data’ is a cost-effective method for exploring theoretical and empirical issues vis-à -vis public perceptions of serious crime
A Novel Anomaly-Based Intrusion Detection Model Using PSOGWO-Optimized BP Neural Network and GA-Based Feature Selection
Intrusion detection systems (IDS) are crucial for network security because they enable detection of and response to malicious traffic. However, as next-generation communications networks become increasingly diversified and interconnected, intrusion detection systems are confronted with dimensionality difficulties. Prior works have shown that high-dimensional datasets that simulate real-world network data increase the complexity and processing time of IDS system training and testing, while irrelevant features waste resources and reduce the detection rate. In this paper, a new intrusion detection model is presented which uses a genetic algorithm (GA) for feature selection and optimization algorithms for gradient descent. First, the GA-based method is used to select a set of highly correlated features from the NSL-KDD dataset that can significantly improve the detection ability of the proposed model. A Back-Propagation Neural Network (BPNN) is then trained using the HPSOGWO method, a hybrid combination of the Particle Swarm Optimization (PSO) and Grey Wolf Optimization (GWO) algorithms. Finally, the hybrid HPSOGWO-BPNN algorithm is used to solve binary and multi-class classification problems on the NSL-KDD dataset. The experimental outcomes demonstrate that the proposed model achieves better performance than other techniques in terms of accuracy, with a lower error rate and better ability to detect different types of attacks