1,366 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    Robust interventions in network epidemiology

    Get PDF
    Which individual should we vaccinate to minimize the spread of a disease? Designing optimal interventions of this kind can be formalized as an optimization problem on networks, in which we have to select a budgeted number of dynamically important nodes to receive treatment that optimizes a dynamical outcome. Describing this optimization problem requires specifying the network, a model of the dynamics, and an objective for the outcome of the dynamics. In real-world contexts, these inputs are vulnerable to misspecification---the network and dynamics must be inferred from data, and the decision-maker must operationalize some (potentially abstract) goal into a mathematical objective function. Moreover, the tools to make reliable inferences---on the dynamical parameters, in particular---remain limited due to computational problems and issues of identifiability. Given these challenges, models thus remain more useful for building intuition than for designing actual interventions. This thesis seeks to elevate complex dynamical models from intuition-building tools to methods for the practical design of interventions. First, we circumvent the inference problem by searching for robust decisions that are insensitive to model misspecification.If these robust solutions work well across a broad range of structural and dynamic contexts, the issues associated with accurately specifying the problem inputs are largely moot. We explore the existence of these solutions across three facets of dynamic importance common in network epidemiology. Second, we introduce a method for analytically calculating the expected outcome of a spreading process under various interventions. Our method is based on message passing, a technique from statistical physics that has received attention in a variety of contexts, from epidemiology to statistical inference.We combine several facets of the message-passing literature for network epidemiology.Our method allows us to test general probabilistic, temporal intervention strategies (such as seeding or vaccination). Furthermore, the method works on arbitrary networks without requiring the network to be locally tree-like .This method has the potential to improve our ability to discriminate between possible intervention outcomes. Overall, our work builds intuition about the decision landscape of designing interventions in spreading dynamics. This work also suggests a way forward for probing the decision-making landscape of other intervention contexts. More broadly, we provide a framework for exploring the boundaries of designing robust interventions with complex systems modeling tools

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Digital Traces of the Mind::Using Smartphones to Capture Signals of Well-Being in Individuals

    Get PDF
    General context and questions Adolescents and young adults typically use their smartphone several hours a day. Although there are concerns about how such behaviour might affect their well-being, the popularity of these powerful devices also opens novel opportunities for monitoring well-being in daily life. If successful, monitoring well-being in daily life provides novel opportunities to develop future interventions that provide personalized support to individuals at the moment they require it (just-in-time adaptive interventions). Taking an interdisciplinary approach with insights from communication, computational, and psychological science, this dissertation investigated the relation between smartphone app use and well-being and developed machine learning models to estimate an individual’s well-being based on how they interact with their smartphone. To elucidate the relation between smartphone trace data and well-being and to contribute to the development of technologies for monitoring well-being in future clinical practice, this dissertation addressed two overarching questions:RQ1: Can we find empirical support for theoretically motivated relations between smartphone trace data and well-being in individuals? RQ2: Can we use smartphone trace data to monitor well-being in individuals?Aims The first aim of this dissertation was to quantify the relation between the collected smartphone trace data and momentary well-being at the sample level, but also for each individual, following recent conceptual insights and empirical findings in psychological, communication, and computational science. A strength of this personalized (or idiographic) approach is that it allows us to capture how individuals might differ in how smartphone app use is related to their well-being. Considering such interindividual differences is important to determine if some individuals might potentially benefit from spending more time on their smartphone apps whereas others do not or even experience adverse effects. The second aim of this dissertation was to develop models for monitoring well-being in daily life. The present work pursued this transdisciplinary aim by taking a machine learning approach and evaluating to what extent we might estimate an individual’s well-being based on their smartphone trace data. If such traces can be used for this purpose by helping to pinpoint when individuals are unwell, they might be a useful data source for developing future interventions that provide personalized support to individuals at the moment they require it (just-in-time adaptive interventions). With this aim, the dissertation follows current developments in psychoinformatics and psychiatry, where much research resources are invested in using smartphone traces and similar data (obtained with smartphone sensors and wearables) to develop technologies for detecting whether an individual is currently unwell or will be in the future. Data collection and analysis This work combined novel data collection techniques (digital phenotyping and experience sampling methodology) for measuring smartphone use and well-being in the daily lives of 247 student participants. For a period up to four months, a dedicated application installed on participants’ smartphones collected smartphone trace data. In the same time period, participants completed a brief smartphone-based well-being survey five times a day (for 30 days in the first month and 30 days in the fourth month; up to 300 assessments in total). At each measurement, this survey comprised questions about the participants’ momentary level of procrastination, stress, and fatigue, while sleep duration was measured in the morning. Taking a time-series and machine learning approach to analysing these data, I provide the following contributions: Chapter 2 investigates the person-specific relation between passively logged usage of different application types and momentary subjective procrastination, Chapter 3 develops machine learning methodology to estimate sleep duration using smartphone trace data, Chapter 4 combines machine learning and explainable artificial intelligence to discover smartphone-tracked digital markers of momentary subjective stress, Chapter 5 uses a personalized machine learning approach to evaluate if smartphone trace data contains behavioral signs of fatigue. Collectively, these empirical studies provide preliminary answers to the overarching questions of this dissertation.Summary of results With respect to the theoretically motivated relations between smartphone trace data and wellbeing (RQ1), we found that different patterns in smartphone trace data, from time spent on social network, messenger, video, and game applications to smartphone-tracked sleep proxies, are related to well-being in individuals. The strength and nature of this relation depends on the individual and app usage pattern under consideration. The relation between smartphone app use patterns and well-being is limited in most individuals, but relatively strong in a minority. Whereas some individuals might benefit from using specific app types, others might experience decreases in well-being when spending more time on these apps. With respect to the question whether we might use smartphone trace data to monitor well-being in individuals (RQ2), we found that smartphone trace data might be useful for this purpose in some individuals and to some extent. They appear most relevant in the context of sleep monitoring (Chapter 3) and have the potential to be included as one of several data sources for monitoring momentary procrastination (Chapter 2), stress (Chapter 4), and fatigue (Chapter 5) in daily life. Outlook Future interdisciplinary research is needed to investigate whether the relationship between smartphone use and well-being depends on the nature of the activities performed on these devices, the content they present, and the context in which they are used. Answering these questions is essential to unravel the complex puzzle of developing technologies for monitoring well-being in daily life.<br/

    Analog Photonics Computing for Information Processing, Inference and Optimisation

    Full text link
    This review presents an overview of the current state-of-the-art in photonics computing, which leverages photons, photons coupled with matter, and optics-related technologies for effective and efficient computational purposes. It covers the history and development of photonics computing and modern analogue computing platforms and architectures, focusing on optimization tasks and neural network implementations. The authors examine special-purpose optimizers, mathematical descriptions of photonics optimizers, and their various interconnections. Disparate applications are discussed, including direct encoding, logistics, finance, phase retrieval, machine learning, neural networks, probabilistic graphical models, and image processing, among many others. The main directions of technological advancement and associated challenges in photonics computing are explored, along with an assessment of its efficiency. Finally, the paper discusses prospects and the field of optical quantum computing, providing insights into the potential applications of this technology.Comment: Invited submission by Journal of Advanced Quantum Technologies; accepted version 5/06/202

    Coupled-Space Attacks against Random-Walk-based Anomaly Detection

    Full text link
    Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical coupled-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the coupled-space attack against graph anomaly detection in which the graph space relies on the feature space.Comment: 13 page

    Geometric Learning on Graph Structured Data

    Get PDF
    Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs. This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution. In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks

    ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing

    Full text link
    Given the rapid ascent of large language models (LLMs), we study the question: (How) can large language models help in reviewing of scientific papers or proposals? We first conduct some pilot studies where we find that (i) GPT-4 outperforms other LLMs (Bard, Vicuna, Koala, Alpaca, LLaMa, Dolly, OpenAssistant, StableLM), and (ii) prompting with a specific question (e.g., to identify errors) outperforms prompting to simply write a review. With these insights, we study the use of LLMs (specifically, GPT-4) for three tasks: 1. Identifying errors: We construct 13 short computer science papers each with a deliberately inserted error, and ask the LLM to check for the correctness of these papers. We observe that the LLM finds errors in 7 of them, spanning both mathematical and conceptual errors. 2. Verifying checklists: We task the LLM to verify 16 closed-ended checklist questions in the respective sections of 15 NeurIPS 2022 papers. We find that across 119 {checklist question, paper} pairs, the LLM had an 86.6% accuracy. 3. Choosing the "better" paper: We generate 10 pairs of abstracts, deliberately designing each pair in such a way that one abstract was clearly superior than the other. The LLM, however, struggled to discern these relatively straightforward distinctions accurately, committing errors in its evaluations for 6 out of the 10 pairs. Based on these experiments, we think that LLMs have a promising use as reviewing assistants for specific reviewing tasks, but not (yet) for complete evaluations of papers or proposals

    A Survey of Graph-based Deep Learning for Anomaly Detection in Distributed Systems

    Full text link
    Anomaly detection is a crucial task in complex distributed systems. A thorough understanding of the requirements and challenges of anomaly detection is pivotal to the security of such systems, especially for real-world deployment. While there are many works and application domains that deal with this problem, few have attempted to provide an in-depth look at such systems. In this survey, we explore the potentials of graph-based algorithms to identify anomalies in distributed systems. These systems can be heterogeneous or homogeneous, which can result in distinct requirements. One of our objectives is to provide an in-depth look at graph-based approaches to conceptually analyze their capability to handle real-world challenges such as heterogeneity and dynamic structure. This study gives an overview of the State-of-the-Art (SotA) research articles in the field and compare and contrast their characteristics. To facilitate a more comprehensive understanding, we present three systems with varying abstractions as use cases. We examine the specific challenges involved in anomaly detection within such systems. Subsequently, we elucidate the efficacy of graphs in such systems and explicate their advantages. We then delve into the SotA methods and highlight their strength and weaknesses, pointing out the areas for possible improvements and future works.Comment: The first two authors (A. Danesh Pazho and G. Alinezhad Noghre) have equal contribution. The article is accepted by IEEE Transactions on Knowledge and Data Engineerin
    • …
    corecore