92 research outputs found
Machine learning applications in search algorithms for gravitational waves from compact binary mergers
Gravitational waves from compact binary mergers are now routinely observed by Earth-bound detectors. These observations enable exciting new science, as they have opened a new window to the Universe.
However, extracting gravitational-wave signals from the noisy detector data is a challenging problem. The most sensitive search algorithms for compact binary mergers use matched filtering, an algorithm that compares the data with a set of expected template signals. As detectors are upgraded and more sophisticated signal models become available, the number of required templates will increase, which can make some sources computationally prohibitive to search for. The computational cost is of particular concern when low-latency alerts should be issued to maximize the time for electromagnetic follow-up observations. One potential solution to reduce computational requirements that has started to be explored in the last decade is machine learning. However, different proposed deep learning searches target varying parameter spaces and use metrics that are not always comparable to existing literature. Consequently, a clear picture of the capabilities of machine learning searches has been sorely missing.
In this thesis, we closely examine the sensitivity of various deep learning gravitational-wave search algorithms and introduce new methods to detect signals from binary black hole and binary neutron star mergers at previously untested statistical confidence levels. By using the sensitive distance as our core metric, we allow for a direct comparison of our algorithms to state-of-the-art search pipelines. As part of this thesis, we organized a global mock data challenge to create a benchmark for machine learning search algorithms targeting compact binaries. This way, the tools developed in this thesis are made available to the greater community by publishing them as open source software.
Our studies show that, depending on the parameter space, deep learning gravitational-wave search algorithms are already competitive with current production search pipelines. We also find that strategies developed for traditional searches can be effectively adapted to their machine learning counterparts. In regions where matched filtering becomes computationally expensive, available deep learning algorithms are also limited in their capability. We find reduced sensitivity to long duration signals compared to the excellent results for short-duration binary black hole signals
Geometric Learning on Graph Structured Data
Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs.
This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution.
In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks
Effect of Adapting to Human Preferences on Trust in Human-Robot Teaming
We present the effect of adapting to human preferences on trust in a
human-robot teaming task. The team performs a task in which the robot acts as
an action recommender to the human. It is assumed that the behavior of the
human and the robot is based on some reward function they try to optimize. We
use a new human trust-behavior model that enables the robot to learn and adapt
to the human's preferences in real-time during their interaction using Bayesian
Inverse Reinforcement Learning. We present three strategies for the robot to
interact with a human: a non-learner strategy, in which the robot assumes that
the human's reward function is the same as the robot's, a non-adaptive learner
strategy that learns the human's reward function for performance estimation,
but still optimizes its own reward function, and an adaptive-learner strategy
that learns the human's reward function for performance estimation and also
optimizes this learned reward function. Results show that adapting to the
human's reward function results in the highest trust in the robot.Comment: 6 pages, 6 figures, AAAI Fall Symposium on Agent Teaming in
Mixed-Motive Situation
Efficient and Explainable Neural Ranking
The recent availability of increasingly powerful hardware has caused a shift from traditional information retrieval (IR) approaches based on term matching, which remained the state of the art for several decades, to large pre-trained neural language models. These neural rankers achieve substantial improvements in performance, as their complexity and extensive pre-training give them the ability of understanding natural language in a way. As a result, neural rankers go beyond term matching by performing relevance estimation based on the semantics of queries and documents.
However, these improvements in performance don't come without sacrifice. In this thesis, we focus on two fundamental challenges of neural ranking models, specifically, ones based on large language models: On the one hand, due to their complexity, the models are inefficient; they require considerable amounts of computational power, which often comes in the form of specialized hardware, such as GPUs or TPUs. Consequently, the carbon footprint is an increasingly important aspect of systems using neural IR. This effect is amplified when low latency is required, as in, for example, web search. On the other hand, neural models are known for being inherently unexplainable; in other words, it is often not comprehensible for humans why a neural model produced a specific output. In general, explainability is deemed important in order to identify undesired behavior, such as bias.
We tackle the efficiency challenge of neural rankers by proposing Fast-Forward indexes, which are simple vector forward indexes that heavily utilize pre-computation techniques. Our approach substantially reduces the computational load during query processing, enabling efficient ranking solely on CPUs without requiring hardware acceleration. Furthermore, we introduce BERT-DMN to show that the training efficiency of neural rankers can be improved by training only parts of the model.
In order to improve the explainability of neural ranking, we propose the Select-and-Rank paradigm to make ranking models explainable by design: First, a query-dependent subset of the input document is extracted to serve as an explanation; second, the ranking model makes its decision based only on the extracted subset, rather than the complete document. We show that our models exhibit performance similar to models that are not explainable by design and conduct a user study to determine the faithfulness of the explanations.
Finally, we introduce BoilerNet, a web content extraction technique that allows the removal of boilerplate from web pages, leaving only the main content in plain text. Our method requires no feature engineering and can be used to aid in the process of creating new document corpora from the web
Optimization and coarse-grid selection for algebraic multigrid
Multigrid methods are often the most efficient approaches for solving the very
large linear systems that arise from discretized PDEs and other problems. Algebraic
multigrid (AMG) methods are used when the discretization lacks the structure needed
to enable more efficient geometric multigrid techniques. AMG methods rely in part
on heuristic graph algorithms to achieve their performance. Reduction-based AMG
(AMGr) algorithms attempt to formalize these heuristics.
The main focus of this thesis is to develop eâ”ective algebraic multigrid methods.
A key step in all AMG approaches is the choice of the coarse/fine partitioning, aiming
to balance the convergence of the iteration with its cost. In past work (MacLachlan
and Saad, A greedy strategy for coarse-grid selection, SISC 2007), a constrained
combinatorial optimization problem was used to define the âbestâ coarse grid within
the setting of two-level reduction-based AMG and was shown to be NP-complete. In
the first part of the thesis, a new coarsening algorithm based on simulated annealing
has been developed to solve this problem. The new coarsening algorithm gives better
results than the greedy algorithm developed previously.
The goal of the second part of the thesis is to improve the classical AMGr method.
Convergence factor bounds do not hold when AMGr algorithms are applied to matrices
that are not diagonally dominant. In this part of our research, we present
modifications to the classical AMGr algorithm that improve its performance on such
matrices. For non-diagonally dominant matrices, we find that strength of connection
plays a vital role in the performance of AMGr. To generalize the diagonal
approximations of AFF used in classical AMGr, we use a sparse approximate inverse
(SPAI) method, with nonzero pattern determined by strong connections, to define
the AMGr-style interpolation operator, coupled with rescaling based on relaxed vectors.
We present numerical results demonstrating the robustness of this approach for
non-diagonally dominant systems.
In the third part of this research, we have developed an improved deterministic
coarsening algorithm that generalizes an existing technique known as Lloydâs algorithm.
The improved algorithm provides better control of the number of clusters than
classical approaches and attempts to provide more âcompactâ groupings
Fortschritte im unĂŒberwachten Lernen und Anwendungsbereiche: Subspace Clustering mit Hintergrundwissen, semantisches Passworterraten und erlernte Indexstrukturen
Over the past few years, advances in data science, machine learning and, in particular, unsupervised learning have enabled significant progress in many scientific fields and even in everyday life. Unsupervised learning methods are usually successful whenever they can be tailored to specific applications using appropriate requirements based on domain expertise. This dissertation shows how purely theoretical research can lead to circumstances that favor overly optimistic results, and the advantages of application-oriented research based on specific background knowledge. These observations apply to traditional unsupervised learning problems such as clustering, anomaly detection and dimensionality reduction. Therefore, this thesis presents extensions of these classical problems, such as subspace clustering and principal component analysis, as well as several specific applications with relevant interfaces to machine learning. Examples include password guessing using semantic word embeddings and learning spatial index structures using statistical models. In essence, this thesis shows that application-oriented research has many advantages for current and future research.In den letzten Jahren haben Fortschritte in der Data Science, im maschinellen Lernen und insbesondere im unĂŒberwachten Lernen zu erheblichen Fortentwicklungen in vielen Bereichen der Wissenschaft und des tĂ€glichen Lebens gefĂŒhrt. Methoden des unĂŒberwachten Lernens sind in der Regel dann erfolgreich, wenn sie durch geeignete, auf Expertenwissen basierende Anforderungen an spezifische Anwendungen angepasst werden können. Diese Dissertation zeigt, wie rein theoretische Forschung zu UmstĂ€nden fĂŒhren kann, die allzu optimistische Ergebnisse begĂŒnstigen, und welche Vorteile anwendungsorientierte Forschung hat, die auf spezifischem Hintergrundwissen basiert. Diese Beobachtungen gelten fĂŒr traditionelle unĂŒberwachte Lernprobleme wie Clustering, Anomalieerkennung und DimensionalitĂ€tsreduktion. Daher werden in diesem Beitrag Erweiterungen dieser klassischen Probleme, wie Subspace Clustering und Hauptkomponentenanalyse, sowie einige spezifische Anwendungen mit relevanten Schnittstellen zum maschinellen Lernen vorgestellt. Beispiele sind das Erraten von Passwörtern mit Hilfe semantischer Worteinbettungen und das Lernen von rĂ€umlichen Indexstrukturen mit Hilfe statistischer Modelle. Im Wesentlichen zeigt diese Arbeit, dass anwendungsorientierte Forschung viele Vorteile fĂŒr die aktuelle und zukĂŒnftige Forschung hat
ON EXPRESSIVENESS, INFERENCE, AND PARAMETER ESTIMATION OF DISCRETE SEQUENCE MODELS
Huge neural autoregressive sequence models have achieved impressive performance across different applications, such as NLP, reinforcement learning, and bioinformatics. However, some lingering problems (e.g., consistency and coherency of generated texts) continue to exist, regardless of the parameter count. In the first part of this thesis, we chart a taxonomy of the expressiveness of various sequence model families (Ch 3). In particular, we put forth complexity-theoretic proofs that string latent-variable sequence models are strictly more expressive than energy-based sequence models, which in turn are more expressive than autoregressive sequence models. Based on these findings, we introduce residual energy-based sequence models, a family of energy-based sequence models (Ch 4) whose sequence weights can be evaluated efficiently, and also perform competitively against autoregressive models. However, we show how unrestricted energy-based sequence models can suffer from uncomputability; and how such a problem is generally unfixable without knowledge of the true sequence distribution (Ch 5).
In the second part of the thesis, we study practical sequence model families and algorithms based on theoretical findings in the first part of the thesis. We introduce neural particle smoothing (Ch 6), a family of approximate sampling methods that work with conditional latent variable models. We also introduce neural finite-state transducers (Ch 7), which extend weighted finite state transducers with the introduction of mark strings, allowing scoring transduction paths in a finite state transducer with a neural network. Finally, we propose neural regular expressions (Ch 8), a family of neural sequence models that are easy to engineer, allowing a user to design flexible weighted relations using Marked FSTs, and combine these weighted relations together with various operations
A Design Concept for a Tourism Recommender System for Regional Development
Despite of tourism infrastructure and software, the development of tourism is hampered due to the lack of information support, which encapsulates various aspects of travel implementation. This paper highlights a demand for integrating various approaches and methods to develop a universal tourism information recommender system when building individual tourist routes. The study objective is proposing a concept of a universal information recommender system for building a personalized tourist route. The developed design concept for such a system involves a procedure for data collection and preparation for tourism product synthesis; a methodology for tourism product formation according to user preferences; the main stages of this methodology implementation. To collect and store information from real travelers, this paper proposes to use elements of blockchain technology in order to ensure information security. A model that specifies the key elements of a tourist route planning process is presented. This article can serve as a reference and knowledge base for digital business system analysts, system designers, and digital tourism business implementers for better digital business system design and implementation in the tourism sector
Methods for Estimating Mass-Sensitive Observables of Ultra-High Energy Cosmic Rays using Artificial Neural Networks
Die ultrahochenergetische kosmische Strahlung besteht aus den energiereichsten, natĂŒrlich vorkommenden Teilchen, die der Menschheit bekannt sind. Da sie zwei GröĂenordnungen jenseits der Energieskala der derzeitigen Generation von Teilchenbeschleunigern liegt, sind die Fragen, wie die kosmische Strahlung ihre Energie erhĂ€lt und woher sie stammt, noch immer ein Mysterium. DarĂŒber hinaus stellt der Ăberschuss an Myonen in den Zerfallsprodukten der ultrahochenergetischen kosmischen Strahlung in der ErdatmosphĂ€re auch unser VerstĂ€ndnis der hadronischen Wechselwirkungen bei höchsten Energien in Frage. Um die RĂ€tsel der kosmischen Strahlung zu lösen, ist es unerlĂ€sslich, die Massen der einfallenden kosmischen Strahlen zu bestimmen. Die Trennung der schweren kosmischen Strahlung von der leichten kosmischen Strahlung ermöglicht Untersuchungen der Einfallsrichtungen, um minimal abgelenkte kosmische Strahlung mit potenziellen Quellen in Verbindung zu bringen. Da die Anzahl der Nukleonen direkt mit der Anzahl der erzeugten Myonen zusammenhĂ€ngt, können wir dadurch auĂerdem die Modelle der hadronischen Wechselwirkung anhand realer Messungen ĂŒberprĂŒfen.
Aufgrund des geringen Flusses der ultrahochenergetischen kosmischen Strahlung sind groĂe Detektoren fĂŒr den indirekten Nachweis der kosmischen Strahlung erforderlich, um genĂŒgend Statistik zu erhalten. Das Pierre-Auger-Observatorium ist das gröĂte Observatorium fĂŒr kosmische Strahlung auf der Welt und deckt einen FlĂ€che von ĂŒber 3000 km ab. Der Zerfall eines hochenergetischen Teilchens in der AtmosphĂ€re löst eine Kaskade von SekundĂ€rteilchen aus, die als Luftschauer bezeichnet wird. Das Observatorium ist speziell fĂŒr den Nachweis von solchen Luftschauern konzipiert worden. Dabei wird die AtmosphĂ€re als Kalorimeter genutzt, um die Entwicklung der Schauer in LĂ€ngsrichtung mit Fluoreszenzdetektoren zu messen. ZusĂ€tzlich wird ein OberflĂ€chendetektor mit regelmĂ€Ăig angeordneten Detektorstationen eingesetzt, um die Menge der erzeugten SekundĂ€rteilchen zu messen. Diese SekundĂ€rteilchen werden ĂŒblicherweise als Schauer-FuĂabdruck bezeichnet.
Die zunehmende PopularitĂ€t kĂŒnstlicher neuronaler Netze in jĂŒngster Zeit hat Physikern neue, einfach zu handhabende Werkzeuge gegeben, um physikalische Probleme datengesteuert anzugehen. Die Verwendung neuronaler Netze bietet die Möglichkeit, detaillierte Daten, wie z. B. Schauer-FuĂabdrĂŒcke, mit physikalischen BeobachtungsgröĂen, wie z. B. der Energie eines Luftschauers, in Beziehung zu setzen, ohne dass ein analytisches Modell erstellt werden muss.
Das Hauptziel dieser Arbeit ist die Untersuchung von Methoden, die auf neuronalen Netzen basieren, um aus den Daten des OberflĂ€chendetektors des Pierre-Auger-Observatoriums Informationen zu extrahieren, die mit der Masse der kosmischen Strahlung korrelieren. Um dies zu erreichen, habe ich zwei verschiedene AnsĂ€tze in Betracht gezogen und die NĂŒtzlichkeit der Verwendung neuronaler Netze untersucht. Dabei basiert der erste Ansatz auf der Extraktion des Myonengehalts in jeder Station des OberflĂ€chendetektors und der zweite Ansatz auf der Messung des gesamten Schauer-FuĂabdrucks. Anhand von Monte-Carlo-Simulationsstudien habe ich den ersten Ansatz zugunsten des zweiten verworfen, da letzterer vielversprechendere Ergebnisse lieferte. Aus dieser Simulationsstudie habe ich drei verschiedene neuronale Netze ausgewĂ€hlt, die fĂŒr die Vorhersage der Tiefe des Schauer-Maximums, des relativen Myonengehalts und der logarithmischen Masse von Schauer-FuĂabdrĂŒcken trainiert wurden. In einem letzten Schritt benutzte ich die Netzwerke, um die Massenzusammensetzung der ultrahochenergetischen kosmischen Strahlung aus Messungen vom Pierre Auger Observatorium zu bestimmen
- âŠ