11,336 research outputs found
Automation for network security configuration: state of the art and research trends
The size and complexity of modern computer networks are progressively increasing, as a consequence of novel architectural paradigms such as the Internet of Things and network virtualization. Consequently, a manual orchestration and configuration of network security functions is no more feasible, in an environment where cyber attacks can dramatically exploit breaches related to any minimum configuration error. A new frontier is then the introduction of automation in network security configuration, i.e., automatically designing the architecture of security services and the configurations of network security functions, such as firewalls, VPN gateways, etc. This opportunity has been enabled by modern computer networks technologies, such as virtualization. In view of these considerations, the motivations for the introduction of automation in network security configuration are first introduced, alongside with the key automation enablers. Then, the current state of the art in this context is surveyed, focusing on both the achieved improvements and the current limitations. Finally, possible future trends in the field are illustrated
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
A Firewall Optimization for Threat-Resilient Micro-Segmentation in Power System Networks
Electric power delivery relies on a communications backbone that must be
secure. SCADA systems are essential to critical grid functions and include
industrial control systems (ICS) protocols such as the Distributed Network
Protocol-3 (DNP3). These protocols are vulnerable to cyber threats that power
systems, as cyber-physical critical infrastructure, must be protected against.
For this reason, the NERC Critical Infrastructure Protection standard CIP-005-5
specifies that an electronic system perimeter is needed, accomplished with
firewalls. This paper presents how these electronic system perimeters can be
optimally found and generated using a proposed meta-heuristic approach for
optimal security zone formation for large-scale power systems. Then, to
implement the optimal firewall rules in a large scale power system model, this
work presents a prototype software tool that takes the optimization results and
auto-configures the firewall nodes for different utilities in a cyber-physical
testbed. Using this tool, firewall policies are configured for all the
utilities and their substations within a synthetic 2000-bus model, assuming two
different network topologies. Results generate the optimal electronic security
perimeters to protect a power system's data flows and compare the number of
firewalls, monetary cost, and risk alerts from path analysis.Comment: 12 pages, 22 figure
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
Machine learning and mixed reality for smart aviation: applications and challenges
The aviation industry is a dynamic and ever-evolving sector. As technology advances and becomes more sophisticated, the aviation industry must keep up with the changing trends. While some airlines have made investments in machine learning and mixed reality technologies, the vast majority of regional airlines continue to rely on inefficient strategies and lack digital applications. This paper investigates the state-of-the-art applications that integrate machine learning and mixed reality into the aviation industry. Smart aerospace engineering design, manufacturing, testing, and services are being explored to increase operator productivity. Autonomous systems, self-service systems, and data visualization systems are being researched to enhance passenger experience. This paper investigate safety, environmental, technological, cost, security, capacity, and regulatory challenges of smart aviation, as well as potential solutions to ensure future quality, reliability, and efficiency
A Machine Learning based Empirical Evaluation of Cyber Threat Actors High Level Attack Patterns over Low level Attack Patterns in Attributing Attacks
Cyber threat attribution is the process of identifying the actor of an attack
incident in cyberspace. An accurate and timely threat attribution plays an
important role in deterring future attacks by applying appropriate and timely
defense mechanisms. Manual analysis of attack patterns gathered by honeypot
deployments, intrusion detection systems, firewalls, and via trace-back
procedures is still the preferred method of security analysts for cyber threat
attribution. Such attack patterns are low-level Indicators of Compromise (IOC).
They represent Tactics, Techniques, Procedures (TTP), and software tools used
by the adversaries in their campaigns. The adversaries rarely re-use them. They
can also be manipulated, resulting in false and unfair attribution. To
empirically evaluate and compare the effectiveness of both kinds of IOC, there
are two problems that need to be addressed. The first problem is that in recent
research works, the ineffectiveness of low-level IOC for cyber threat
attribution has been discussed intuitively. An empirical evaluation for the
measure of the effectiveness of low-level IOC based on a real-world dataset is
missing. The second problem is that the available dataset for high-level IOC
has a single instance for each predictive class label that cannot be used
directly for training machine learning models. To address these problems in
this research work, we empirically evaluate the effectiveness of low-level IOC
based on a real-world dataset that is specifically built for comparative
analysis with high-level IOC. The experimental results show that the high-level
IOC trained models effectively attribute cyberattacks with an accuracy of 95%
as compared to the low-level IOC trained models where accuracy is 40%.Comment: 20 page
AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn
Data-driven companies use AI models extensively to develop products and
intelligent business solutions, making the health of these models crucial for
business success. Model monitoring and alerting in industries pose unique
challenges, including a lack of clear model health metrics definition, label
sparsity, and fast model iterations that result in short-lived models and
features. As a product, there are also requirements for scalability,
generalizability, and explainability. To tackle these challenges, we propose
AlerTiger, a deep-learning-based MLOps model monitoring system that helps AI
teams across the company monitor their AI models' health by detecting anomalies
in models' input features and output score over time. The system consists of
four major steps: model statistics generation, deep-learning-based anomaly
detection, anomaly post-processing, and user alerting. Our solution generates
three categories of statistics to indicate AI model health, offers a two-stage
deep anomaly detection solution to address label sparsity and attain the
generalizability of monitoring new models, and provides holistic reports for
actionable alerts. This approach has been deployed to most of LinkedIn's
production AI models for over a year and has identified several model issues
that later led to significant business metric gains after fixing
A Novel Approach To User Agent String Parsing For Vulnerability Analysis Using Mutli-Headed Attention
The increasing reliance on the internet has led to the proliferation of a
diverse set of web-browsers and operating systems (OSs) capable of browsing the
web. User agent strings (UASs) are a component of web browsing that are
transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain
information about the client device and software, which is used by web servers
for various purposes such as content negotiation and security. However, due to
the proliferation of various browsers and devices, parsing UASs is a
non-trivial task due to a lack of standardization of UAS formats. Current
rules-based approaches are often brittle and can fail when encountering such
non-standard formats. In this work, a novel methodology for parsing UASs using
Multi-Headed Attention Based transformers is proposed. The proposed methodology
exhibits strong performance in parsing a variety of UASs with differing
formats. Furthermore, a framework to utilize parsed UASs to estimate the
vulnerability scores for large sections of publicly visible IT networks or
regions is also discussed. The methodology present here can also be easily
extended or deployed for real-time parsing of logs in enterprise settings.Comment: Accepted to the International Conference on Machine Learning and
Cybernetics (ICMLC) 202
Estado del arte de los métodos de seguridad de datos aplicados en internet de las cosas
El propósito de esta investigación es crear
el estado del arte de técnicas y procesos
usados en el ámbito de la ingeniería,
utilizando un mapeo sistemático enfocado
a los métodos de seguridad de datos
aplicados en Internet de las Cosas. El
trabajo recopila documentos a partir del
2017 utilizando mapeo sistemático de
literatura y una revisión sistemática
aplicando el método PRISMA, cumpliendo
el rigor metodológico y de la calidad.
Obteniendo uno de los métodos más usados
como cifrado utilizado en un 52,8% en las
empresas, también la monitorización de
anomalías y detección de intrusiones, el
control de acceso basado en roles, gestión
de claves y autenticación de dispositivos y
usuarios, además protocolos que se utilizan
para establecer la comunicación y la
transferencia de información de manera
segura entre distintos dispositivos, como
MQTT, que es utilizado por el 38.3% para
la comunicación de datos en tiempo real,
JWT, AMQP, DDS y HTTP, que es
utilizado por el 70% de los desarrolladores
de IoT, acompañando estos mecanismos
con buenas prácticas y ser conscientes de
las consecuencias negativas de la mala
práctica, garantizando la seguridad de
datos en los dispositivos IoT.The purpose of this research is to create the
state of the art of techniques and processes
used in the field of engineering, using a
systematic mapping focused on data
security methods applied in the Internet of
Things. The work collects documents from
2017 using systematic literature mapping
and a systematic review applying the
PRISMA method, complying with
methodological rigor and quality.
Obtaining one of the most used methods as
used by 52.8% in companies, also anomaly
monitoring and intrusion detection, rolebased
access control, key management and
device and user authentication, in addition
protocols used to establish communication
and transfer information securely between
different devices, such as MQTT, which is
used by 38.3% for real-time data
communication, JWT, AMQP, DDS and
HTTP, which is used by 70% of IoT
developers, accompanying these
mechanisms with good practices and being
aware of the negative consequences of bad
practice, guaranteeing data security in IoT
devices
Responsible Design Patterns for Machine Learning Pipelines
Integrating ethical practices into the AI development process for artificial
intelligence (AI) is essential to ensure safe, fair, and responsible operation.
AI ethics involves applying ethical principles to the entire life cycle of AI
systems. This is essential to mitigate potential risks and harms associated
with AI, such as algorithm biases. To achieve this goal, responsible design
patterns (RDPs) are critical for Machine Learning (ML) pipelines to guarantee
ethical and fair outcomes. In this paper, we propose a comprehensive framework
incorporating RDPs into ML pipelines to mitigate risks and ensure the ethical
development of AI systems. Our framework comprises new responsible AI design
patterns for ML pipelines identified through a survey of AI ethics and data
management experts and validated through real-world scenarios with expert
feedback. The framework guides AI developers, data scientists, and
policy-makers to implement ethical practices in AI development and deploy
responsible AI systems in production.Comment: 20 pages, 4 figures, 5 table
- …