24 research outputs found

    Clustering environmental flow cytometry data by searching density peaks

    Get PDF
    Microbial single cells can be characterized by their phenotypic properties using flow cytometry. Therefore flow cytometry can be used to analyze various aspects of environmental microbial communities. In recent years, researchers have focused on fully exploiting the multivariate data that such analyses generate. As they are interested in the diversity of an environmental sample, we need a proper estimation of the number of species and their abundances. We modified a recently published algorithm to estimate the microbial diversity based on flow cytometry data. After giving a brief sketch of the problem setup, we will review this algorithm alongside its various implementations. Moreover we will present our current implementation combined with future challenges we foresee

    A Novel Machine Learning Classifier Based on a Qualia Modeling Agent (QMA)

    Get PDF
    This dissertation addresses a problem found in supervised machine learning (ML) classification, that the target variable, i.e., the variable a classifier predicts, has to be identified before training begins and cannot change during training and testing. This research develops a computational agent, which overcomes this problem. The Qualia Modeling Agent (QMA) is modeled after two cognitive theories: Stanovich\u27s tripartite framework, which proposes learning results from interactions between conscious and unconscious processes; and, the Integrated Information Theory (IIT) of Consciousness, which proposes that the fundamental structural elements of consciousness are qualia. By modeling the informational relationships of qualia, the QMA allows for retaining and reasoning-over data sets in a non-ontological, non-hierarchical qualia space (QS). This novel computational approach supports concept drift, by allowing the target variable to change ad infinitum without re-training while achieving classification accuracy comparable to or greater than benchmark classifiers. Additionally, the research produced a functioning model of Stanovich\u27s framework, and a computationally tractable working solution for a representation of qualia, which when exposed to new examples, is able to match the causal structure and generate new inferences

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types

    Advancing and Leveraging Tractable Likelihood Models

    Get PDF
    The past decade has seen a remarkable improvement in a variety of machine learning applications thanks to numerous advances in deep neural networks (DNN). These models are now the de facto standard in fields ranging from image/speech recognition to driverless cars and have begun to permeate aspects of modern science and everyday life. The deep learning revolution has also resulted in highly effective generative models such as score matching models, diffusion models, VAEs, GANs, and tractable likelihood models. These models are best known for their ability to create novel samples of impressive quality but are usually limited to highly structured data modalities. Expanding the capabilities and applications of likelihood models beyond conventional data formats and generative applications can increase functionality, interpretability, and intuition compared to conventional methods. This dissertation addresses shortcomings in likelihood models over less structured data and explores methods to exploit a learned density as part of a larger application. We begin by advancing the performance of likelihood models outside the standard, ordered data regime by developing methods that are applicable to sets, e.g., point clouds. Many data sources contain instances that are a collection of unordered points, such as points on the surface of scans from human organs, sets of images from a web page, or LiDAR observations commonly used in driverless cars or (hyper-spectral) aerial surveys.We then explore several applications of density models. First, we consider generative process over neural networks themselves and show that training over ensembles of these sampled models can lead to improved robustness to adversarial attacks. Next, we demonstrate how to use the transformative portion of a normalizing flow as a feature extractor in conjunction with a downstream task to estimate expectations over model performance in local and global regions.Finally, we propose a learnable, continuous parameterization of mixture models directly on the input space to improve model interpretability while simultaneously allowing for arbitrary marginalization or conditioning without the need to train new models or develop complex masking mechanisms.Doctor of Philosoph

    Dynamic Community Detection Method of a Social Network Based on Node Embedding Representation

    Get PDF
    Copyright © 2022 by the authors. The node embedding method enables network structure feature learning and representation for social network community detection. However, the traditional node embedding method only focuses on a node’s individual feature representation and ignores the global topological feature representation of the network. Traditional community detection methods cannot use the static node vector from the traditional node embedding method to calculate the dynamic features of the topological structure. In this study, an incremental dynamic community detection model based on a graph neural network node embedding representation is proposed, comprising the following aspects. A node embedding model based on influence random walk improves the information enrichment of the node feature vector representation, which improves the performance of the initial static community detection, whose results are used as the original structure of dynamic community detection. By combining a cohesion coefficient and ordinary modularity, a new modularity calculation method is proposed that uses an incremental training method to obtain node vector representation to detect a dynamic community from the perspectives of coarse- and fine-grained adjustments. A performance analysis based on two dynamic network datasets shows that the proposed method performs better than benchmark algorithms based on time complexity, community detection accuracy, and other indicators.National Natural Science Foundation of China (61802258, 61572326); Natural Science Foundation of Shanghai (18ZR1428300)

    Multilayer Networks

    Full text link
    In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure

    The structure and dynamics of multilayer networks

    Get PDF
    In the past years, network theory has successfully characterized the interaction among the constituents of a variety of complex systems, ranging from biological to technological, and social systems. However, up until recently, attention was almost exclusively given to networks in which all components were treated on equivalent footing, while neglecting all the extra information about the temporal- or context-related properties of the interactions under study. Only in the last years, taking advantage of the enhanced resolution in real data sets, network scientists have directed their interest to the multiplex character of real-world systems, and explicitly considered the time-varying and multilayer nature of networks. We offer here a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.Comment: In Press, Accepted Manuscript, Physics Reports 201

    Beyond Flatland : exploring graphs in many dimensions

    Get PDF
    Societies, technologies, economies, ecosystems, organisms, . . . Our world is composed of complex networks—systems with many elements that interact in nontrivial ways. Graphs are natural models of these systems, and scientists have made tremendous progress in developing tools for their analysis. However, research has long focused on relatively simple graph representations and problem specifications, often discarding valuable real-world information in the process. In recent years, the limitations of this approach have become increasingly apparent, but we are just starting to comprehend how more intricate data representations and problem formulations might benefit our understanding of relational phenomena. Against this background, our thesis sets out to explore graphs in five dimensions: descriptivity, multiplicity, complexity, expressivity, and responsibility. Leveraging tools from graph theory, information theory, probability theory, geometry, and topology, we develop methods to (1) descriptively compare individual graphs, (2) characterize similarities and differences between groups of multiple graphs, (3) critically assess the complexity of relational data representations and their associated scientific culture, (4) extract expressive features from and for hypergraphs, and (5) responsibly mitigate the risks induced by graph-structured content recommendations. Thus, our thesis is naturally situated at the intersection of graph mining, graph learning, and network analysis.Gesellschaften, Technologien, Volkswirtschaften, Ökosysteme, Organismen, . . . Unsere Welt besteht aus komplexen Netzwerken—Systemen mit vielen Elementen, die auf nichttriviale Weise interagieren. Graphen sind natürliche Modelle dieser Systeme, und die Wissenschaft hat bei der Entwicklung von Methoden zu ihrer Analyse große Fortschritte gemacht. Allerdings hat sich die Forschung lange auf relativ einfache Graphrepräsentationen und Problemspezifikationen beschränkt, oft unter Vernachlässigung wertvoller Informationen aus der realen Welt. In den vergangenen Jahren sind die Grenzen dieser Herangehensweise zunehmend deutlich geworden, aber wir beginnen gerade erst zu erfassen, wie unser Verständnis relationaler Phänomene von intrikateren Datenrepräsentationen und Problemstellungen profitieren kann. Vor diesem Hintergrund erkundet unsere Dissertation Graphen in fünf Dimensionen: Deskriptivität, Multiplizität, Komplexität, Expressivität, und Verantwortung. Mithilfe von Graphentheorie, Informationstheorie, Wahrscheinlichkeitstheorie, Geometrie und Topologie entwickeln wir Methoden, welche (1) einzelne Graphen deskriptiv vergleichen, (2) Gemeinsamkeiten und Unterschiede zwischen Gruppen multipler Graphen charakterisieren, (3) die Komplexität relationaler Datenrepräsentationen und der mit ihnen verbundenen Wissenschaftskultur kritisch beleuchten, (4) expressive Merkmale von und für Hypergraphen extrahieren, und (5) verantwortungsvoll den Risiken begegnen, welche die Graphstruktur von Inhaltsempfehlungen mit sich bringt. Damit liegt unsere Dissertation naturgemäß an der Schnittstelle zwischen Graph Mining, Graph Learning und Netzwerkanalyse
    corecore