456 research outputs found
Active Sampling-based Binary Verification of Dynamical Systems
Nonlinear, adaptive, or otherwise complex control techniques are increasingly
relied upon to ensure the safety of systems operating in uncertain
environments. However, the nonlinearity of the resulting closed-loop system
complicates verification that the system does in fact satisfy those
requirements at all possible operating conditions. While analytical proof-based
techniques and finite abstractions can be used to provably verify the
closed-loop system's response at different operating conditions, they often
produce conservative approximations due to restrictive assumptions and are
difficult to construct in many applications. In contrast, popular statistical
verification techniques relax the restrictions and instead rely upon
simulations to construct statistical or probabilistic guarantees. This work
presents a data-driven statistical verification procedure that instead
constructs statistical learning models from simulated training data to separate
the set of possible perturbations into "safe" and "unsafe" subsets. Binary
evaluations of closed-loop system requirement satisfaction at various
realizations of the uncertainties are obtained through temporal logic
robustness metrics, which are then used to construct predictive models of
requirement satisfaction over the full set of possible uncertainties. As the
accuracy of these predictive statistical models is inherently coupled to the
quality of the training data, an active learning algorithm selects additional
sample points in order to maximize the expected change in the data-driven model
and thus, indirectly, minimize the prediction error. Various case studies
demonstrate the closed-loop verification procedure and highlight improvements
in prediction error over both existing analytical and statistical verification
techniques.Comment: 23 page
Deciding and verifying network properties locally with few output bits
International audienceGiven a boolean predicate on labeled networks (e.g., the network is acyclic, or the network is properly colored, etc.), deciding in a distributed manner whether a given labeled network satisfies that predicate typically consists, in the standard setting, of every node inspecting its close neighborhood, and outputting a boolean verdict, such that the network satisfies the predicate if and only if all nodes output true. In this paper, we investigate a more general notion of distributed decision in which every node is allowed to output a constant number of bits, which are gathered by a central authority emitting a global boolean verdict based on these outputs, such that the network satisfies the predicate if and only if this global verdict equals true. We analyze the power and limitations of this extended notion of distributed decision
A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability
Graph Neural Networks (GNNs) have made rapid developments in the recent
years. Due to their great ability in modeling graph-structured data, GNNs are
vastly used in various applications, including high-stakes scenarios such as
financial analysis, traffic predictions, and drug discovery. Despite their
great potential in benefiting humans in the real world, recent study shows that
GNNs can leak private information, are vulnerable to adversarial attacks, can
inherit and magnify societal bias from training data and lack interpretability,
which have risk of causing unintentional harm to the users and society. For
example, existing works demonstrate that attackers can fool the GNNs to give
the outcome they desire with unnoticeable perturbation on training graph. GNNs
trained on social networks may embed the discrimination in their decision
process, strengthening the undesirable societal bias. Consequently, trustworthy
GNNs in various aspects are emerging to prevent the harm from GNN models and
increase the users' trust in GNNs. In this paper, we give a comprehensive
survey of GNNs in the computational aspects of privacy, robustness, fairness,
and explainability. For each aspect, we give the taxonomy of the related
methods and formulate the general frameworks for the multiple categories of
trustworthy GNNs. We also discuss the future research directions of each aspect
and connections between these aspects to help achieve trustworthiness
Automated Machine Learning implementation framework in the banking sector
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsAutomated Machine Learning is a subject in the Machine Learning field, designed to give the possibility
of Machine Learning use to non-expert users, it aroused from the lack of subject matter experts, trying
to remove humans from these topic implementations. The advantages behind automated machine
learning are leaning towards the removal of human implementation, fastening the machine learning
deployment speed. The organizations will benefit from effective solutions benchmarking and
validations. The use of an automated machine learning implementation framework can deeply
transform an organization adding value to the business by freeing the subject matter experts of the
low-level machine learning projects, letting them focus on high level projects. This will also help the
organization reach new competence, customization, and decision-making levels in a higher analytical
maturity level.
This work pretends, firstly to investigate the impact and benefits automated machine learning
implementation in the banking sector, and afterwards develop an implementation framework that
could be used by banking institutions as a guideline for the automated machine learning
implementation through their departments. The autoML advantages and benefits are evaluated
regarding business value and competitive advantage and it is presented the implementation in a
fictitious institution, considering all the need steps and the possible setbacks that could arise.
Banking institutions, in their business have different business processes, and since most of them are
old institutions, the main concerns are related with the automating their business process, improving
their analytical maturity and sensibilizing their workforce to the benefits of the implementation of new
forms of work. To proceed to a successful implementation plan should be known the institution
particularities, adapt to them and ensured the sensibilization of the workforce and management to
the investments that need to be made and the changes in all levels of their organizational work that
will come from that, that will lead to a lot of facilities in everyone’s daily work
Local Certification of Graph Decompositions and Applications to Minor-Free Classes
Local certification consists in assigning labels to the nodes of a network to certify that some given property is satisfied, in such a way that the labels can be checked locally. In the last few years, certification of graph classes received a considerable attention. The goal is to certify that a graph G belongs to a given graph class ?. Such certifications with labels of size O(log n) (where n is the size of the network) exist for trees, planar graphs and graphs embedded on surfaces. Feuilloley et al. ask if this can be extended to any class of graphs defined by a finite set of forbidden minors.
In this work, we develop new decomposition tools for graph certification, and apply them to show that for every small enough minor H, H-minor-free graphs can indeed be certified with labels of size O(log n). We also show matching lower bounds using a new proof technique
Multi-Objective Optimisation for Tuning Building Heating and Cooling Loads Forecasting Models
Machine learning (ML) has been recognised as a powerful method for modelling building energy consumption. The capability of ML to provide a fast and accurate prediction of energy loads makes it an ideal tool for decision-making tasks related to sustainable design and retrofit planning. However, the accuracy of these ML models is much dependant on the selection of the right hyper-parameters for specific building dataset. This paper proposes a method for optimising ML model for forecasting both heating and cooling loads. The technique employs multi-objective optimisation with evolutionary algorithms to search the space of possible parameters. The proposed approach not only tune one model to precisely predict building energy loads but also accelerates the process of model optimisation. The study utilises a simulated building energy data generated in EnergyPlus to demonstrate the efficiency of the proposed method, and compares the outcomes with the regular ML tuning procedure (i.e. grid search). The optimised model provides a reliable tool for building designers and engineers to explore a large space of the available building materials and technologies
Deep Learning-Based Intrusion Detection Methods for Computer Networks and Privacy-Preserving Authentication Method for Vehicular Ad Hoc Networks
The incidence of computer network intrusions has significantly increased over the last decade, partially attributed to a thriving underground cyber-crime economy and the widespread availability of advanced tools for launching such attacks. To counter these attacks, researchers in both academia and industry have turned to machine learning (ML) techniques to develop Intrusion Detection Systems (IDSes) for computer networks. However, many of the datasets use to train ML classifiers for detecting intrusions are not balanced, with some classes having fewer samples than others. This can result in ML classifiers producing suboptimal results. In this dissertation, we address this issue and present better ML based solutions for intrusion detection. Our contributions in this direction can be summarized as follows:
Balancing Data Using Synthetic Data to detect intrusions in Computer Networks: In the past, researchers addressed the issue of imbalanced data in datasets by using over-sampling and under-sampling techniques. In this study, we go beyond such traditional methods and utilize a synthetic data generation method called Con- ditional Generative Adversarial Network (CTGAN) to balance the datasets and in- vestigate its impact on the performance of widely used ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples for balancing intrusion detection datasets. We use two widely used publicly available datasets and conduct extensive experiments and show that ML classifiers trained on these datasets balanced with synthetic samples generated by CTGAN have higher prediction accuracy and Matthew Correlation Coefficient (MCC) scores than those trained on imbalanced datasets by 8% and 13%, respectively.
Deep Learning approach for intrusion detection using focal loss function: To overcome the data imbalance problem for intrusion detection, we leverage the specialized loss function, called focal loss, that automatically down-weighs easy ex- amples and focuses on the hard negatives by facilitating dynamically scaled-gradient updates for training ML models effectively. We implement our approach using two well-known Deep Learning (DL) neural network architectures. Compared to training DL models using cross-entropy loss function, our approach (training DL models using focal loss function) improved accuracy, precision, F1 score, and MCC score by 24%, 39%, 39%, and 60% respectively.
Efficient Deep Learning approach to detect Intrusions using Few-shot Learning: To address the issue of imbalance the datasets and develop a highly effective IDS, we utilize the concept of few-shot learning. We present a Few-Shot and Self-Supervised learning framework, called FS3, for detecting intrusions in IoT networks. FS3 works in three phases. Our approach involves first pretraining an encoder on a large-scale external dataset in a selfsupervised manner. We then employ few-shot learning (FSL), which seeks to replicate the encoder’s ability to learn new patterns from only a few training examples. During the encoder training us- ing a small number of samples, we train them contrastively, utilizing the triplet loss function. The third phase introduces a novel K-Nearest neighbor algorithm that sub- samples the majority class instances to further reduce imbalance and improve overall performance. Our proposed framework FS3, utilizing only 20% of labeled data, out- performs fully supervised state-of-the-art models by up to 42.39% and 43.95% with respect to the metrics precision and F1 score, respectively.
The rapid evolution of the automotive industry and advancements in wireless com- munication technologies will result in the widespread deployment of Vehicular ad hoc networks (VANETs). However, despite the network’s potential to enable intelligent and autonomous driving, it also introduces various attack vectors that can jeopardize its security. In this dissertation, we present efficient privacy-preserving authenticated message dissemination scheme in VANETs.
Conditional Privacy-preserving Authentication and Message Dissemination Scheme using Timestamp based Pseudonyms: To authenticate a message sent by a vehicle using its pseudonym, a certificate of the pseudonym signed by the central authority is generally utilized. If a vehicle is found to be malicious, certificates associated with all the pseudonyms assigned to it must be revoked. Certificate revocation lists (CRLs) should be shared with all entities that will be corresponding with the vehicle. As each vehicle has a large pool of pseudonyms allocated to it, the CRL can quickly grow in size as the number of revoked vehicles increases. This results in high storage overheads for storing the CRL, and significant authentication overheads as the receivers must check their CRL for each message received to verify its pseudonym. To address this issue, we present a timestamp-based pseudonym allocation scheme that reduces the storage overhead and authentication overhead by streamlining the CRL management process
- …