5,489 research outputs found

    Opportunities and risks of stochastic deep learning

    Get PDF
    This thesis studies opportunities and risks associated with stochasticity in deep learning that specifically manifest in the context of adversarial robustness and neural architecture search (NAS). On the one hand, opportunities arise because stochastic methods have a strong impact on robustness and generalisation, both from a theoretical and an empirical standpoint. In addition, they provide a framework for navigating non-differentiable search spaces, and for expressing data and model uncertainty. On the other hand, trade-offs (i.e., risks) that are coupled with these benefits need to be carefully considered. The three novel contributions that comprise the main body of this thesis are, by these standards, instances of opportunities and risks. In the context of adversarial robustness, our first contribution proves that the impact of an adversarial input perturbation on the output of a stochastic neural network (SNN) is theoretically bounded. Specifically, we demonstrate that SNNs are maximally robust when they achieve weight-covariance alignment, i.e., when the vectors of their classifier layer are aligned with the eigenvectors of that layer's covariance matrix. Based on our theoretical insights, we develop a novel SNN architecture with excellent empirical adversarial robustness and show that our theoretical guarantees also hold experimentally. Furthermore, we discover that SNNs partially owe their robustness to having a noisy loss landscape. Gradient-based adversaries find this landscape difficult to ascend during adversarial perturbation search, and therefore fail to create strong adversarial examples. We show that inducing a noisy loss landscape is not an effective defence mechanism, as it is easy to circumvent. To demonstrate that point, we develop a stochastic loss-smoothing extension to state-of-the-art gradient-based adversaries that allows them to attack successfully. Interestingly, our loss-smoothing extension can also (i) be successful against non-stochastic neural networks that defend by altering their loss landscape in different ways, and (ii) strengthen gradient-free adversaries. Our third and final contribution lies in the field of few-shot learning, where we develop a stochastic NAS method for adapting pre-trained neural networks to previously unseen classes, by observing only a few training examples of each new class. We determine that the adaptation of a pre-trained backbone is not as simple as adapting all of its parameters. In fact, adapting or fine-tuning the entire architecture is sub-optimal, as a lot of layers already encode knowledge optimally. Our NAS algorithm searches for the optimal subset of pre-trained parameters to be adapted or fine-tuned, which yields a significant improvement over the existing paradigm for few-shot adaptation

    Literature review on the smart city resources analysis with big data methodologies

    Get PDF
    This article provides a systematic literature review on applying different algorithms to municipal data processing, aiming to understand how the data were collected, stored, pre-processed, and analyzed, to compare various methods, and to select feasible solutions for further research. Several algorithms and data types are considered, finding that clustering, classification, correlation, anomaly detection, and prediction algorithms are frequently used. As expected, the data is of several types, ranging from sensor data to images. It is a considerable challenge, although several algorithms work very well, such as Long Short-Term Memory (LSTM) for timeseries prediction and classification.Open access funding provided by FCT|FCCN (b-on).info:eu-repo/semantics/publishedVersio

    Dynamic Connectivity in Disk Graphs

    Get PDF
    Let S ⊆ R2 be a set of n sites in the plane, so that every site s ∈ S has an associated radius rs > 0. Let D(S) be the disk intersection graph defined by S, i.e., the graph with vertex set S and an edge between two distinct sites s, t ∈ S if and only if the disks with centers s, t and radii rs , rt intersect. Our goal is to design data structures that maintain the connectivity structure of D(S) as sites are inserted and/or deleted in S. First, we consider unit disk graphs, i.e., we fix rs = 1, for all sites s ∈ S. For this case, we describe a data structure that has O(log2 n) amortized update time and O(log n/ log log n) query time. Second, we look at disk graphs with bounded radius ratio Ψ, i.e., for all s ∈ S, we have 1 ≤ rs ≤ Ψ, for a parameter Ψ that is known in advance. Here, we not only investigate the fully dynamic case, but also the incremental and the decremental scenario, where only insertions or only deletions of sites are allowed. In the fully dynamic case, we achieve amortized expected update time O(Ψ log4 n) and query time O(log n/ log log n). This improves the currently best update time by a factor of Ψ. In the incremental case, we achieve logarithmic dependency on Ψ, with a data structure that has O(α(n)) amortized query time and O(log Ψ log4 n) amortized expected update time, where α(n) denotes the inverse Ackermann function. For the decremental setting, we first develop an efficient decremental disk revealing data structure: given two sets R and B of disks in the plane, we can delete disks from B, and upon each deletion, we receive a list of all disks in R that no longer intersect the union of B. Using this data structure, we get decremental data structures with a query time of O(log n/ log log n) that supports deletions in O(n log Ψ log4 n) overall expected time for disk graphs with bounded radius ratio Ψ and O(n log5 n) overall expected time for disk graphs with arbitrary radii, assuming that the deletion sequence is oblivious of the internal random choices of the data structures

    Meta-learning algorithms and applications

    Get PDF
    Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples. Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number. Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation. More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents

    An active learning approach to model solid-electrolyte interphase formation in Li-ion batteries

    Get PDF
    Li-ion batteries store electrical energy by electrochemically reducing Li ions from a liquid electrolyte in a graphitic electrode. During these reactions, electrolytic species in contact with the electrode particles form a solid-electrolyte interphase (SEI), a layer between the electrode and electrolyte. This interphase allows the exchange of Li ions between the electrode and electrolyte while blocking electron transfer, affecting the performance and life of the battery. A network of reactions in a small region determines the final structure of this interphase. This complex problem has been studied using different multi-scale computational approaches. However, it is challenging to obtain a comprehensive characterization of these models in connection with the effects of model parameters on the output, due to the computational costs. In this work, we propose an active learning workflow coupled with a kinetic Monte Carlo (kMC) model for formation of a SEI as a function of reaction barriers including electrochemical, diffusion, and aggregation reactions. This workflow begins by receiving an initial database from a design-of-experiment approach to train an initial Gaussian process classification model. By iterative training of this model in the proposed workflow, we gradually extended the model\u27s validity over a larger subset of reaction barriers. In this workflow, we took advantage of statistical tools to reduce the uncertainty of the model. The trained model is used to study the features of the reaction barriers in the formation of a SEI, which allows us to obtain a new and unique perspective on the reactions that control the formation of a SEI

    Robustness, Heterogeneity and Structure Capturing for Graph Representation Learning and its Application

    Get PDF
    Graph neural networks (GNNs) are potent methods for graph representation learn- ing (GRL), which extract knowledge from complicated (graph) structured data in various real-world scenarios. However, GRL still faces many challenges. Firstly GNN-based node classification may deteriorate substantially by overlooking the pos- sibility of noisy data in graph structures, as models wrongly process the relation among nodes in the input graphs as the ground truth. Secondly, nodes and edges have different types in the real-world and it is essential to capture this heterogeneity in graph representation learning. Next, relations among nodes are not restricted to pairwise relations and it is necessary to capture the complex relations accordingly. Finally, the absence of structural encodings, such as positional information, deterio- rates the performance of GNNs. This thesis proposes novel methods to address the aforementioned problems: 1. Bayesian Graph Attention Network (BGAT): Developed for situations with scarce data, this method addresses the influence of spurious edges. Incor- porating Bayesian principles into the graph attention mechanism enhances robustness, leading to competitive performance against benchmarks (Chapter 3). 2. Neighbour Contrastive Heterogeneous Graph Attention Network (NC-HGAT): By enhancing a cutting-edge self-supervised heterogeneous graph neural net- work model (HGAT) with neighbour contrastive learning, this method ad- dresses heterogeneity and uncertainty simultaneously. Extra attention to edge relations in heterogeneous graphs also aids in subsequent classification tasks (Chapter 4). 3. A novel ensemble learning framework is introduced for predicting stock price movements. It adeptly captures both group-level and pairwise relations, lead- ing to notable advancements over the existing state-of-the-art. The integration of hypergraph and graph models, coupled with the utilisation of auxiliary data via GNNs before recurrent neural network (RNN), provides a deeper under- standing of long-term dependencies between similar entities in multivariate time series analysis (Chapter 5). 4. A novel framework for graph structure learning is introduced, segmenting graphs into distinct patches. By harnessing the capabilities of transformers and integrating other position encoding techniques, this approach robustly capture intricate structural information within a graph. This results in a more comprehensive understanding of its underlying patterns (Chapter 6)

    Classical and quantum algorithms for scaling problems

    Get PDF
    This thesis is concerned with scaling problems, which have a plethora of connections to different areas of mathematics, physics and computer science. Although many structural aspects of these problems are understood by now, we only know how to solve them efficiently in special cases.We give new algorithms for non-commutative scaling problems with complexity guarantees that match the prior state of the art. To this end, we extend the well-known (self-concordance based) interior-point method (IPM) framework to Riemannian manifolds, motivated by its success in the commutative setting. Moreover, the IPM framework does not obviously suffer from the same obstructions to efficiency as previous methods. It also yields the first high-precision algorithms for other natural geometric problems in non-positive curvature.For the (commutative) problems of matrix scaling and balancing, we show that quantum algorithms can outperform the (already very efficient) state-of-the-art classical algorithms. Their time complexity can be sublinear in the input size; in certain parameter regimes they are also optimal, whereas in others we show no quantum speedup over the classical methods is possible. Along the way, we provide improvements over the long-standing state of the art for searching for all marked elements in a list, and computing the sum of a list of numbers.We identify a new application in the context of tensor networks for quantum many-body physics. We define a computable canonical form for uniform projected entangled pair states (as the solution to a scaling problem), circumventing previously known undecidability results. We also show, by characterizing the invariant polynomials, that the canonical form is determined by evaluating the tensor network contractions on networks of bounded size

    Fuzzy Norm-Explicit Product Quantization for Recommender Systems

    Get PDF
    As the data resources grow, providing recommendations that best meet the demands has become a vital requirement in business and life to overcome the information overload problem. However, building a system suggesting relevant recommendations has always been a point of debate. One of the most cost-efficient techniques in terms of producing relevant recommendations at a low complexity is Product Quantization (PQ). PQ approaches have continued developing in recent years. This system’s crucial challenge is improving product quantization performance in terms of recall measures without compromising its complexity. This makes the algorithm suitable for problems that require a greater number of potentially relevant items without disregarding others, at high-speed and low-cost to keep up with traffic. This is the case of online shops where the recommendations for the purpose are important, although customers can be susceptible to scoping other products. A recent approach has been exploiting the notion of norm sub-vectors encoded in product quantizers. This research proposes a fuzzy approach to perform norm-based product quantization. Type-2 Fuzzy sets (T2FSs) define the codebook allowing sub-vectors (T2FSs) to be associated with more than one element of the codebook, and next, its norm calculus is resolved by means of integration. Our method finesses the recall measure up, making the algorithm suitable for problems that require querying at most possible potential relevant items without disregarding others. The proposed approach is tested with three public recommender benchmark datasets and compared against seven PQ approaches for Maximum Inner-Product Search (MIPS). The proposed method outperforms all PQ approaches such as NEQ, PQ, and RQ up to +6%, +5%, and +8% by achieving a recall of 94%, 69%, 59% in Netflix, Audio, Cifar60k datasets, respectively. More and over, computing time and complexity nearly equals the most computationally efficient existing PQ method in the state-of-the-art

    Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data

    Get PDF
    Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works

    Dynamic data structures for k-nearest neighbor queries

    Get PDF
    Our aim is to develop dynamic data structures that support k-nearest neighbors (k-NN) queries for a set of n point sites in the plane in O(f(n)+k) time, where f(n) is some polylogarithmic function of n. The key component is a general query algorithm that allows us to find the k-NN spread over t substructures simultaneously, thus reducing an O(tk) term in the query time to O(k). Combining this technique with the logarithmic method allows us to turn any static k-NN data structure into a data structure supporting both efficient insertions and queries. For the fully dynamic case, this technique allows us to recover the deterministic, worst-case, O(log2⁥n/log⁥log⁥n+k) query time for the Euclidean distance claimed before, while preserving the polylogarithmic update times. We adapt this data structure to also support fully dynamic geodesic k-NN queries among a set of sites in a simple polygon. For this purpose, we design a shallow cutting based, deletion-only k-NN data structure. More generally, we obtain a dynamic planar k-NN data structure for any type of distance functions for which we can build vertical shallow cuttings. We apply all of our methods in the plane for the Euclidean distance, the geodesic distance, and general, constant-complexity, algebraic distance functions
    • …
    corecore