9 research outputs found

    Offline and Online Models for Learning Pairwise Relations in Data

    Get PDF
    Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting

    Inference of Effective Pairwise Relations for Data Processing

    Get PDF
    In various data science and artificial intelligence areas, representation learning is a performance-critical step. While different representation learning methods can detect different descriptive and latent features, many representation learning methods reflect on pairwise relations. The thesis consists of two parts, studying pairwise relations from two points of view: i) Pairwise relations between the states of a Markov chain. ii) Pairwise relations between objects in a dataset based on a desired (dis)similarity measure. In the first part of the thesis, we consider Markov chains, noting that pairwise relations between its states are naturally modeled by the state-transition matrix. We propose a method for modeling the performance of a synchronization method for a multi-processor architecture. Our model introduces and builds upon a cache line bouncing process that models the interaction of threads accessing the shared cache lines. In the second part of the thesis, we consider representation learning using the transitive-aware Minimax distance, which enables the extraction of elongated manifolds and structures in the data. While recent work has made Minimax distances computationally feasible, little attention has been put to its memory footprint, which is naturally O(N^2), the cost of storing all pairwise distances. We do, however, compute a novel hierarchical representation of the data, requiring O(N) memory, from which pairwise Minimax distances can then be efficiently inferred, in total requiring O(N) memory, at the cost of higher computational cost. An alternative sampling-based approach is also derived, which computes approximate Minimax distances, also in O(N) memory but with a significantly reduced computational cost, while still yielding a good approximation, as verified by impressive results on clustering benchmarks. Finally, we develop an unsupervised learning framework for clustering vehicle trajectories based on Minimax distances. The performance of the framework is validated on real-world datasets collected from real driving scenarios, on which satisfactory performance is demonstrated

    Memory-Efficient Minimax Distance Measures

    No full text
    Minimax distance measure is a transitive-aware measure that allows us to extract elongated manifolds and structures in the data in an unsupervised manner. Existing methods require a quadratic memory with respect to the number of data points to compute the pairwise Minimax distances. In this paper, we investigate two memory-efficient approaches to reduce the memory requirement and achieve linear space complexity. The first approach proposes a novel hierarchical representation of the data that requires only O(N) memory and from which the pairwise Minimax distances can be derived in a memory-efficient manner. The second approach is an efficient sampling method that adapts well to the proposed hierarchical representation of the data. This approach accurately recovers the majority of Minimax distances, especially the most important ones. It still works in O(N) memory, but with a substantially lower computational cost, and yields impressive results on clustering benchmarks, as a downstream task. We evaluate our methods on synthetic and real-world datasets from a variety of domains

    A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification

    No full text
    Bottleneck identification is a challenging task in network analysis, especially when the network is not fully specified. To address this task, we develop a unified online learning framework based on combinatorial semi-bandits that performs bottleneck identification in parallel with learning the specifications of the underlying network. Within this framework, we adapt and study various combinatorial semi-bandit methods such as epsilon-greedy, LinUCB, BayesUCB, NeuralUCB, and Thompson Sampling. In addition, our framework is capable of using contextual information in the form of contextual bandits. Finally, we evaluate our framework on the real-world application of road networks and demonstrate its effectiveness in different settings

    Modeling the performance of atomic primitives on modern architectures

    No full text
    Utilizing the atomic primitives of a processor to access a memory location atomically is key to the correctness and feasibility of parallel software systems. The performance of atomics plays a significant role in the scalability and overall performance of parallel software systems. In this work, we study the performance -in terms of latency, throughput, fairness, energy consumption- of atomic primitives in the context of the two common software execution settings that result in high and low contention access on shared memory. We perform and present an exhaustive study of the performance of atomics in these two application contexts and propose a performance model that captures their behavior. We consider two state-of-the-art architectures: Intel Xeon E5, Xeon Phi (KNL). We propose a model that is centered around the bouncing of cache lines between threads that execute atomic primitives on these shared cache lines. The model is very simple to be used in practice and captures the behavior of atomics accurately under these execution scenarios and facilitate algorithmic design decisions in multi-threaded programming

    Vehicle Motion Trajectories Clustering via Embedding Transitive Relations

    No full text
    In order to assure safety in self-driving cars, the Autonomous Drive functionality needs to pass safety tests not only based on real scenarios collected from field driving tests, but also according to similar perturbed trajectories that might have not been collected in the data collection. To achieve this goal, we need to build a scenario database containing both real-world collected data and synthesized scenarios that are consistent with the real-world driving behaviour. This requires accurate and efficient annotation methods for extraction and analysis of driving scenarios trajectories. In this study, we propose an effective non-parametric trajectory clustering framework to annotate scenarios based on transitive relations of trajectories in an embedded space. We investigate the proposed framework\u27s performance on real-world trajectory data sets and demonstrate its promising results, despite the complexity caused by having trajectories of varying lengths. Furthermore, we extend the framework to validate the augmentation of the real data sets with the synthetic trajectories generated by Generative Adversarial Networks (Recurrent AE-GAN) where we conclude the consistency of the generated and the real scenarios

    Online Learning of Network Bottlenecks via Minimax Paths

    Get PDF
    In this paper, we study bottleneck identification in networks via extracting minimax paths. Many real-world networks have stochastic weights for which full knowledge is not available in advance. Therefore, we model this task as a combinatorial semi-bandit problem to which we apply a combinatorial version of Thompson Sampling and establish an upper bound on the corresponding Bayesian regret. Due to the computational intractability of the problem, we then devise an alternative problem formulation which approximates the original objective. Finally, we experimentally evaluate the performance of Thompson Sampling with the approximate formulation on real-world directed and undirected networks
    corecore