402 research outputs found

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en BiotecnologĂ­a, IngenierĂ­a y TecnologĂ­a QuĂ­micaLĂ­nea de InvestigaciĂłn: IngenierĂ­a, Ciencia de Datos y BioinformĂĄticaClave Programa: DBICĂłdigo LĂ­nea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e InformĂĄtic

    Eigenvalue Interlacing of Bipartite Graphs and Construction of Expander Code using Vertex-split of a Bipartite Graph

    Full text link
    The second largest eigenvalue of a graph is an important algebraic parameter which is related with the expansion, connectivity and randomness properties of a graph. Expanders are highly connected sparse graphs. In coding theory, Expander codes are Error Correcting codes made up of bipartite expander graphs. In this paper, first we prove the interlacing of the eigenvalues of the adjacency matrix of the bipartite graph with the eigenvalues of the bipartite quotient matrices of the corresponding graph matrices. Then we obtain bounds for the second largest and second smallest eigenvalues. Since the graph is bipartite, the results for Laplacian will also hold for Signless Laplacian matrix. We then introduce a new method called vertex-split of a bipartite graph to construct asymptotically good expander codes with expansion factor D2<α<D\frac{D}{2}<\alpha < D and Ï”<12\epsilon<\frac{1}{2} and prove a condition for the vertex-split of a bipartite graph to be k−k-connected with respect to λ2.\lambda_{2}. Further, we prove that the vertex-split of GG is a bipartite expander. Finally, we construct an asymptotically good expander code whose factor graph is a graph obtained by the vertex-split of a bipartite graph.Comment: 17 pages, 2 figure

    Learning and reasoning with graph data

    Get PDF
    Reasoning about graphs, and learning from graph data is a field of artificial intelligence that has recently received much attention in the machine learning areas of graph representation learning and graph neural networks. Graphs are also the underlying structures of interest in a wide range of more traditional fields ranging from logic-oriented knowledge representation and reasoning to graph kernels and statistical relational learning. In this review we outline a broad map and inventory of the field of learning and reasoning with graphs that spans the spectrum from reasoning in the form of logical deduction to learning node embeddings. To obtain a unified perspective on such a diverse landscape we introduce a simple and general semantic concept of a model that covers logic knowledge bases, graph neural networks, kernel support vector machines, and many other types of frameworks. Still at a high semantic level, we survey common strategies for model specification using probabilistic factorization and standard feature construction techniques. Based on this semantic foundation we introduce a taxonomy of reasoning tasks that casts problems ranging from transductive link prediction to asymptotic analysis of random graph models as queries of different complexities for a given model. Similarly, we express learning in different frameworks and settings in terms of a common statistical maximum likelihood principle. Overall, this review aims to provide a coherent conceptual framework that provides a basis for further theoretical analyses of respective strengths and limitations of different approaches to handling graph data, and that facilitates combination and integration of different modeling paradigms

    On the spectra and spectral radii of token graphs

    Full text link
    Let GG be a graph on nn vertices. The kk-token graph (or symmetric kk-th power) of GG, denoted by Fk(G)F_k(G) has as vertices the (nk){n\choose k} kk-subsets of vertices from GG, and two vertices are adjacent when their symmetric difference is a pair of adjacent vertices in GG. In particular, Fk(Kn)F_k(K_n) is the Johnson graph J(n,k)J(n,k), which is a distance-regular graph used in coding theory. In this paper, we present some results concerning the (adjacency and Laplacian) spectrum of Fk(G)F_k(G) in terms of the spectrum of GG. For instance, when GG is walk-regular, an exact value for the spectral radius ρ\rho (or maximum eigenvalue) of Fk(G)F_k(G) is obtained. When GG is distance-regular, other eigenvalues of its 22-token graph are derived using the theory of equitable partitions. A generalization of Aldous' spectral gap conjecture (which is now a theorem) is proposed

    Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering

    Get PDF
    In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance

    CIN++: Enhancing Topological Message Passing

    Full text link
    Graph Neural Networks (GNNs) have demonstrated remarkable success in learning from graph-structured data. However, they face significant limitations in expressive power, struggling with long-range interactions and lacking a principled approach to modeling higher-order structures and group interactions. Cellular Isomorphism Networks (CINs) recently addressed most of these challenges with a message passing scheme based on cell complexes. Despite their advantages, CINs make use only of boundary and upper messages which do not consider a direct interaction between the rings present in the underlying complex. Accounting for these interactions might be crucial for learning representations of many real-world complex phenomena such as the dynamics of supramolecular assemblies, neural activity within the brain, and gene regulation processes. In this work, we propose CIN++, an enhancement of the topological message passing scheme introduced in CINs. Our message passing scheme accounts for the aforementioned limitations by letting the cells to receive also lower messages within each layer. By providing a more comprehensive representation of higher-order and long-range interactions, our enhanced topological message passing scheme achieves state-of-the-art results on large-scale and long-range chemistry benchmarks.Comment: 21 pages, 9 figure

    Sur la similarité spectrale des graphes par mesure de corrélation

    Get PDF
    In this paper, we present a spectral similarity measure between two graphs based on a correlation measure between the spectra of their representation matrices Tα := αD+(1−2α)A, parametrized by 0 ≀ α ≀ 1, where A and D are respectively the adjacency matrix and the degree matrix. We also show that Tα is positive semidefinite for α ≄ 1/2. This work tends to show the relevance of this measure, which, when a SVM is implemented using a Gaussian kernel, allows a powerful classification on well known graph databases of the literature and classification of real signals transformed into a graph thanks to the so-called visibility method. The obtained results in terms of accuracy are similar or even better than those obtained with structural kernels with a much lower computation time and this, by computing only one spectrum for each graph. Moreover, we show the contribution of Tα compared to the α-adjacency matrix of Nikiforov for graph classification.Dans cet article, nous prĂ©sentons une mesure de similaritĂ© spectrale entre deux graphes basĂ©e sur un calcul de corrĂ©lation entre les spectres de leurs matrices de reprĂ©sentation Tα := αD + (1 − 2α)A, paramĂ©trĂ©e par 0 ≀ α ≀ 1, oĂč A et D sont respectivement la matrice d’adjacence et la matrice des degrĂ©s. Nous montrons par ailleurs que Tα est semi-dĂ©finie positive pour α ≄ 1/2. Ce travail tend Ă  montrer la pertinence de cette mesure, qui, introduit dans un noyau de type Gaussien d’un SVM permet une classification performante de bases de donnĂ©es de graphes connues de la littĂ©rature et de classification de signaux rĂ©els transformĂ©s en graphe grĂące Ă  la mĂ©thode dite de visibilitĂ©. Les rĂ©sultats obtenus en termes de mesure d’exactitude sont similaires voire meilleurs Ă  ceux obtenus avec des noyaux structurels pour un temps de calcul bien moindre et ce, en ne calculant qu’un seul spectre pour chaque graphe. De plus, nous montrons l’apport de Tα par rapport Ă  la matrice d’α-adjacence de Nikiforov pour la classification de graphes

    Web API evolution patterns: A usage-driven approach

    Get PDF
    As the use of Application Programming Interfaces (APIs) is increasingly growing, their evolution becomes more challenging in terms of the service provided according to consumers' needs. In this paper, we address the role of consumers' needs in WAPIs evolution and introduce a process mining pattern-based method to support providers in WAPIs evolution by analyzing and understanding consumers' behavior, imprinted in WAPI usage logs. We take the position that WAPIs' evolution should be mainly usage-based, i.e., the way consumers use them should be one of the main drivers of their changes. We start by characterizing the structural relationships between endpoints, and next, we summarize these relationships into a set of behavioral patterns (i.e., usage patterns whose occurrences indicate specific consumers' behavior like repetitive or consecutive calls), that can potentially imply the need for changes (e.g., creating new parameters for endpoints, merging endpoints). We analyze the logs and extract several metrics for the endpoints and their relationships, to then detect the patterns. We apply our method in two real-world WAPIs from different domains, education, and health, respectively the WAPI of Barcelona School of Informatics at the Polytechnic University of Catalonia (Facultat d'InformĂ tica de Barcelona, FIB, UPC), and District Health Information Software 2 (DHIS2) WAPI. The feedback from consumers and providers of these WAPIs proved the effectiveness of the detected patterns and confirmed the promising potential of our approach.This paper has been funded by the Spanish Ministerio de Ciencia e InnovaciĂłn under project/funding scheme PID2020-117191RB-I00/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    On Spectrum of Neighbourhood Corona Product of Signed Graphs

    Full text link
    Given two signed graphs Γ1\Gamma_1 with nodes {u1,u2,⋯ ,un}\{u_1,u_2,\cdots,u_n\} and Γ2\Gamma_2, the neighbourhood corona, Γ1∗Γ2\Gamma_1*\Gamma_2 is the signed graph obtained by taking one copy of Γ1\Gamma_1 and n1n_1 copies of Γ2\Gamma_2, and joining every neighbour of the ithi^{th} node with each nodes of the ithi^{th} copy of Γ2\Gamma_2 by a new signed edge. In this paper we will determine the condition for Γ1∗Γ2\Gamma_1*\Gamma_2 to be balanced. We also determine the adjacency spectrum of Γ1∗Γ2\Gamma_1*\Gamma_2 for arbitrary Γ1\Gamma_1 and Γ2\Gamma_2, and Laplacian and signless Laplacian spectrum of Γ1∗Γ2\Gamma_1*\Gamma_2 for regular Γ1\Gamma_1 and arbitrary Γ2\Gamma_2, in terms of the corresponding spectrum of Γ1\Gamma_1 and Γ2\Gamma_2
    • 

    corecore