92 research outputs found

    Machine learning applications in search algorithms for gravitational waves from compact binary mergers

    Get PDF
    Gravitational waves from compact binary mergers are now routinely observed by Earth-bound detectors. These observations enable exciting new science, as they have opened a new window to the Universe. However, extracting gravitational-wave signals from the noisy detector data is a challenging problem. The most sensitive search algorithms for compact binary mergers use matched filtering, an algorithm that compares the data with a set of expected template signals. As detectors are upgraded and more sophisticated signal models become available, the number of required templates will increase, which can make some sources computationally prohibitive to search for. The computational cost is of particular concern when low-latency alerts should be issued to maximize the time for electromagnetic follow-up observations. One potential solution to reduce computational requirements that has started to be explored in the last decade is machine learning. However, different proposed deep learning searches target varying parameter spaces and use metrics that are not always comparable to existing literature. Consequently, a clear picture of the capabilities of machine learning searches has been sorely missing. In this thesis, we closely examine the sensitivity of various deep learning gravitational-wave search algorithms and introduce new methods to detect signals from binary black hole and binary neutron star mergers at previously untested statistical confidence levels. By using the sensitive distance as our core metric, we allow for a direct comparison of our algorithms to state-of-the-art search pipelines. As part of this thesis, we organized a global mock data challenge to create a benchmark for machine learning search algorithms targeting compact binaries. This way, the tools developed in this thesis are made available to the greater community by publishing them as open source software. Our studies show that, depending on the parameter space, deep learning gravitational-wave search algorithms are already competitive with current production search pipelines. We also find that strategies developed for traditional searches can be effectively adapted to their machine learning counterparts. In regions where matched filtering becomes computationally expensive, available deep learning algorithms are also limited in their capability. We find reduced sensitivity to long duration signals compared to the excellent results for short-duration binary black hole signals

    Geometric Learning on Graph Structured Data

    Get PDF
    Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs. This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution. In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks

    Effect of Adapting to Human Preferences on Trust in Human-Robot Teaming

    Full text link
    We present the effect of adapting to human preferences on trust in a human-robot teaming task. The team performs a task in which the robot acts as an action recommender to the human. It is assumed that the behavior of the human and the robot is based on some reward function they try to optimize. We use a new human trust-behavior model that enables the robot to learn and adapt to the human's preferences in real-time during their interaction using Bayesian Inverse Reinforcement Learning. We present three strategies for the robot to interact with a human: a non-learner strategy, in which the robot assumes that the human's reward function is the same as the robot's, a non-adaptive learner strategy that learns the human's reward function for performance estimation, but still optimizes its own reward function, and an adaptive-learner strategy that learns the human's reward function for performance estimation and also optimizes this learned reward function. Results show that adapting to the human's reward function results in the highest trust in the robot.Comment: 6 pages, 6 figures, AAAI Fall Symposium on Agent Teaming in Mixed-Motive Situation

    Efficient and Explainable Neural Ranking

    Get PDF
    The recent availability of increasingly powerful hardware has caused a shift from traditional information retrieval (IR) approaches based on term matching, which remained the state of the art for several decades, to large pre-trained neural language models. These neural rankers achieve substantial improvements in performance, as their complexity and extensive pre-training give them the ability of understanding natural language in a way. As a result, neural rankers go beyond term matching by performing relevance estimation based on the semantics of queries and documents. However, these improvements in performance don't come without sacrifice. In this thesis, we focus on two fundamental challenges of neural ranking models, specifically, ones based on large language models: On the one hand, due to their complexity, the models are inefficient; they require considerable amounts of computational power, which often comes in the form of specialized hardware, such as GPUs or TPUs. Consequently, the carbon footprint is an increasingly important aspect of systems using neural IR. This effect is amplified when low latency is required, as in, for example, web search. On the other hand, neural models are known for being inherently unexplainable; in other words, it is often not comprehensible for humans why a neural model produced a specific output. In general, explainability is deemed important in order to identify undesired behavior, such as bias. We tackle the efficiency challenge of neural rankers by proposing Fast-Forward indexes, which are simple vector forward indexes that heavily utilize pre-computation techniques. Our approach substantially reduces the computational load during query processing, enabling efficient ranking solely on CPUs without requiring hardware acceleration. Furthermore, we introduce BERT-DMN to show that the training efficiency of neural rankers can be improved by training only parts of the model. In order to improve the explainability of neural ranking, we propose the Select-and-Rank paradigm to make ranking models explainable by design: First, a query-dependent subset of the input document is extracted to serve as an explanation; second, the ranking model makes its decision based only on the extracted subset, rather than the complete document. We show that our models exhibit performance similar to models that are not explainable by design and conduct a user study to determine the faithfulness of the explanations. Finally, we introduce BoilerNet, a web content extraction technique that allows the removal of boilerplate from web pages, leaving only the main content in plain text. Our method requires no feature engineering and can be used to aid in the process of creating new document corpora from the web

    Optimization and coarse-grid selection for algebraic multigrid

    Get PDF
    Multigrid methods are often the most efficient approaches for solving the very large linear systems that arise from discretized PDEs and other problems. Algebraic multigrid (AMG) methods are used when the discretization lacks the structure needed to enable more efficient geometric multigrid techniques. AMG methods rely in part on heuristic graph algorithms to achieve their performance. Reduction-based AMG (AMGr) algorithms attempt to formalize these heuristics. The main focus of this thesis is to develop e↔ective algebraic multigrid methods. A key step in all AMG approaches is the choice of the coarse/fine partitioning, aiming to balance the convergence of the iteration with its cost. In past work (MacLachlan and Saad, A greedy strategy for coarse-grid selection, SISC 2007), a constrained combinatorial optimization problem was used to define the “best” coarse grid within the setting of two-level reduction-based AMG and was shown to be NP-complete. In the first part of the thesis, a new coarsening algorithm based on simulated annealing has been developed to solve this problem. The new coarsening algorithm gives better results than the greedy algorithm developed previously. The goal of the second part of the thesis is to improve the classical AMGr method. Convergence factor bounds do not hold when AMGr algorithms are applied to matrices that are not diagonally dominant. In this part of our research, we present modifications to the classical AMGr algorithm that improve its performance on such matrices. For non-diagonally dominant matrices, we find that strength of connection plays a vital role in the performance of AMGr. To generalize the diagonal approximations of AFF used in classical AMGr, we use a sparse approximate inverse (SPAI) method, with nonzero pattern determined by strong connections, to define the AMGr-style interpolation operator, coupled with rescaling based on relaxed vectors. We present numerical results demonstrating the robustness of this approach for non-diagonally dominant systems. In the third part of this research, we have developed an improved deterministic coarsening algorithm that generalizes an existing technique known as Lloyd’s algorithm. The improved algorithm provides better control of the number of clusters than classical approaches and attempts to provide more “compact” groupings

    Fortschritte im unĂŒberwachten Lernen und Anwendungsbereiche: Subspace Clustering mit Hintergrundwissen, semantisches Passworterraten und erlernte Indexstrukturen

    Get PDF
    Over the past few years, advances in data science, machine learning and, in particular, unsupervised learning have enabled significant progress in many scientific fields and even in everyday life. Unsupervised learning methods are usually successful whenever they can be tailored to specific applications using appropriate requirements based on domain expertise. This dissertation shows how purely theoretical research can lead to circumstances that favor overly optimistic results, and the advantages of application-oriented research based on specific background knowledge. These observations apply to traditional unsupervised learning problems such as clustering, anomaly detection and dimensionality reduction. Therefore, this thesis presents extensions of these classical problems, such as subspace clustering and principal component analysis, as well as several specific applications with relevant interfaces to machine learning. Examples include password guessing using semantic word embeddings and learning spatial index structures using statistical models. In essence, this thesis shows that application-oriented research has many advantages for current and future research.In den letzten Jahren haben Fortschritte in der Data Science, im maschinellen Lernen und insbesondere im unĂŒberwachten Lernen zu erheblichen Fortentwicklungen in vielen Bereichen der Wissenschaft und des tĂ€glichen Lebens gefĂŒhrt. Methoden des unĂŒberwachten Lernens sind in der Regel dann erfolgreich, wenn sie durch geeignete, auf Expertenwissen basierende Anforderungen an spezifische Anwendungen angepasst werden können. Diese Dissertation zeigt, wie rein theoretische Forschung zu UmstĂ€nden fĂŒhren kann, die allzu optimistische Ergebnisse begĂŒnstigen, und welche Vorteile anwendungsorientierte Forschung hat, die auf spezifischem Hintergrundwissen basiert. Diese Beobachtungen gelten fĂŒr traditionelle unĂŒberwachte Lernprobleme wie Clustering, Anomalieerkennung und DimensionalitĂ€tsreduktion. Daher werden in diesem Beitrag Erweiterungen dieser klassischen Probleme, wie Subspace Clustering und Hauptkomponentenanalyse, sowie einige spezifische Anwendungen mit relevanten Schnittstellen zum maschinellen Lernen vorgestellt. Beispiele sind das Erraten von Passwörtern mit Hilfe semantischer Worteinbettungen und das Lernen von rĂ€umlichen Indexstrukturen mit Hilfe statistischer Modelle. Im Wesentlichen zeigt diese Arbeit, dass anwendungsorientierte Forschung viele Vorteile fĂŒr die aktuelle und zukĂŒnftige Forschung hat

    ON EXPRESSIVENESS, INFERENCE, AND PARAMETER ESTIMATION OF DISCRETE SEQUENCE MODELS

    Get PDF
    Huge neural autoregressive sequence models have achieved impressive performance across different applications, such as NLP, reinforcement learning, and bioinformatics. However, some lingering problems (e.g., consistency and coherency of generated texts) continue to exist, regardless of the parameter count. In the first part of this thesis, we chart a taxonomy of the expressiveness of various sequence model families (Ch 3). In particular, we put forth complexity-theoretic proofs that string latent-variable sequence models are strictly more expressive than energy-based sequence models, which in turn are more expressive than autoregressive sequence models. Based on these findings, we introduce residual energy-based sequence models, a family of energy-based sequence models (Ch 4) whose sequence weights can be evaluated efficiently, and also perform competitively against autoregressive models. However, we show how unrestricted energy-based sequence models can suffer from uncomputability; and how such a problem is generally unfixable without knowledge of the true sequence distribution (Ch 5). In the second part of the thesis, we study practical sequence model families and algorithms based on theoretical findings in the first part of the thesis. We introduce neural particle smoothing (Ch 6), a family of approximate sampling methods that work with conditional latent variable models. We also introduce neural finite-state transducers (Ch 7), which extend weighted finite state transducers with the introduction of mark strings, allowing scoring transduction paths in a finite state transducer with a neural network. Finally, we propose neural regular expressions (Ch 8), a family of neural sequence models that are easy to engineer, allowing a user to design flexible weighted relations using Marked FSTs, and combine these weighted relations together with various operations

    A Design Concept for a Tourism Recommender System for Regional Development

    Get PDF
    Despite of tourism infrastructure and software, the development of tourism is hampered due to the lack of information support, which encapsulates various aspects of travel implementation. This paper highlights a demand for integrating various approaches and methods to develop a universal tourism information recommender system when building individual tourist routes. The study objective is proposing a concept of a universal information recommender system for building a personalized tourist route. The developed design concept for such a system involves a procedure for data collection and preparation for tourism product synthesis; a methodology for tourism product formation according to user preferences; the main stages of this methodology implementation. To collect and store information from real travelers, this paper proposes to use elements of blockchain technology in order to ensure information security. A model that specifies the key elements of a tourist route planning process is presented. This article can serve as a reference and knowledge base for digital business system analysts, system designers, and digital tourism business implementers for better digital business system design and implementation in the tourism sector

    Methods for Estimating Mass-Sensitive Observables of Ultra-High Energy Cosmic Rays using Artificial Neural Networks

    Get PDF
    Die ultrahochenergetische kosmische Strahlung besteht aus den energiereichsten, natĂŒrlich vorkommenden Teilchen, die der Menschheit bekannt sind. Da sie zwei GrĂ¶ĂŸenordnungen jenseits der Energieskala der derzeitigen Generation von Teilchenbeschleunigern liegt, sind die Fragen, wie die kosmische Strahlung ihre Energie erhĂ€lt und woher sie stammt, noch immer ein Mysterium. DarĂŒber hinaus stellt der Überschuss an Myonen in den Zerfallsprodukten der ultrahochenergetischen kosmischen Strahlung in der ErdatmosphĂ€re auch unser VerstĂ€ndnis der hadronischen Wechselwirkungen bei höchsten Energien in Frage. Um die RĂ€tsel der kosmischen Strahlung zu lösen, ist es unerlĂ€sslich, die Massen der einfallenden kosmischen Strahlen zu bestimmen. Die Trennung der schweren kosmischen Strahlung von der leichten kosmischen Strahlung ermöglicht Untersuchungen der Einfallsrichtungen, um minimal abgelenkte kosmische Strahlung mit potenziellen Quellen in Verbindung zu bringen. Da die Anzahl der Nukleonen direkt mit der Anzahl der erzeugten Myonen zusammenhĂ€ngt, können wir dadurch außerdem die Modelle der hadronischen Wechselwirkung anhand realer Messungen ĂŒberprĂŒfen. Aufgrund des geringen Flusses der ultrahochenergetischen kosmischen Strahlung sind große Detektoren fĂŒr den indirekten Nachweis der kosmischen Strahlung erforderlich, um genĂŒgend Statistik zu erhalten. Das Pierre-Auger-Observatorium ist das grĂ¶ĂŸte Observatorium fĂŒr kosmische Strahlung auf der Welt und deckt einen FlĂ€che von ĂŒber 3000 km2^2 ab. Der Zerfall eines hochenergetischen Teilchens in der AtmosphĂ€re löst eine Kaskade von SekundĂ€rteilchen aus, die als Luftschauer bezeichnet wird. Das Observatorium ist speziell fĂŒr den Nachweis von solchen Luftschauern konzipiert worden. Dabei wird die AtmosphĂ€re als Kalorimeter genutzt, um die Entwicklung der Schauer in LĂ€ngsrichtung mit Fluoreszenzdetektoren zu messen. ZusĂ€tzlich wird ein OberflĂ€chendetektor mit regelmĂ€ĂŸig angeordneten Detektorstationen eingesetzt, um die Menge der erzeugten SekundĂ€rteilchen zu messen. Diese SekundĂ€rteilchen werden ĂŒblicherweise als Schauer-Fußabdruck bezeichnet. Die zunehmende PopularitĂ€t kĂŒnstlicher neuronaler Netze in jĂŒngster Zeit hat Physikern neue, einfach zu handhabende Werkzeuge gegeben, um physikalische Probleme datengesteuert anzugehen. Die Verwendung neuronaler Netze bietet die Möglichkeit, detaillierte Daten, wie z. B. Schauer-FußabdrĂŒcke, mit physikalischen BeobachtungsgrĂ¶ĂŸen, wie z. B. der Energie eines Luftschauers, in Beziehung zu setzen, ohne dass ein analytisches Modell erstellt werden muss. Das Hauptziel dieser Arbeit ist die Untersuchung von Methoden, die auf neuronalen Netzen basieren, um aus den Daten des OberflĂ€chendetektors des Pierre-Auger-Observatoriums Informationen zu extrahieren, die mit der Masse der kosmischen Strahlung korrelieren. Um dies zu erreichen, habe ich zwei verschiedene AnsĂ€tze in Betracht gezogen und die NĂŒtzlichkeit der Verwendung neuronaler Netze untersucht. Dabei basiert der erste Ansatz auf der Extraktion des Myonengehalts in jeder Station des OberflĂ€chendetektors und der zweite Ansatz auf der Messung des gesamten Schauer-Fußabdrucks. Anhand von Monte-Carlo-Simulationsstudien habe ich den ersten Ansatz zugunsten des zweiten verworfen, da letzterer vielversprechendere Ergebnisse lieferte. Aus dieser Simulationsstudie habe ich drei verschiedene neuronale Netze ausgewĂ€hlt, die fĂŒr die Vorhersage der Tiefe des Schauer-Maximums, des relativen Myonengehalts und der logarithmischen Masse von Schauer-FußabdrĂŒcken trainiert wurden. In einem letzten Schritt benutzte ich die Netzwerke, um die Massenzusammensetzung der ultrahochenergetischen kosmischen Strahlung aus Messungen vom Pierre Auger Observatorium zu bestimmen
    • 

    corecore