8,359 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    Robustness, Heterogeneity and Structure Capturing for Graph Representation Learning and its Application

    Get PDF
    Graph neural networks (GNNs) are potent methods for graph representation learn- ing (GRL), which extract knowledge from complicated (graph) structured data in various real-world scenarios. However, GRL still faces many challenges. Firstly GNN-based node classification may deteriorate substantially by overlooking the pos- sibility of noisy data in graph structures, as models wrongly process the relation among nodes in the input graphs as the ground truth. Secondly, nodes and edges have different types in the real-world and it is essential to capture this heterogeneity in graph representation learning. Next, relations among nodes are not restricted to pairwise relations and it is necessary to capture the complex relations accordingly. Finally, the absence of structural encodings, such as positional information, deterio- rates the performance of GNNs. This thesis proposes novel methods to address the aforementioned problems: 1. Bayesian Graph Attention Network (BGAT): Developed for situations with scarce data, this method addresses the influence of spurious edges. Incor- porating Bayesian principles into the graph attention mechanism enhances robustness, leading to competitive performance against benchmarks (Chapter 3). 2. Neighbour Contrastive Heterogeneous Graph Attention Network (NC-HGAT): By enhancing a cutting-edge self-supervised heterogeneous graph neural net- work model (HGAT) with neighbour contrastive learning, this method ad- dresses heterogeneity and uncertainty simultaneously. Extra attention to edge relations in heterogeneous graphs also aids in subsequent classification tasks (Chapter 4). 3. A novel ensemble learning framework is introduced for predicting stock price movements. It adeptly captures both group-level and pairwise relations, lead- ing to notable advancements over the existing state-of-the-art. The integration of hypergraph and graph models, coupled with the utilisation of auxiliary data via GNNs before recurrent neural network (RNN), provides a deeper under- standing of long-term dependencies between similar entities in multivariate time series analysis (Chapter 5). 4. A novel framework for graph structure learning is introduced, segmenting graphs into distinct patches. By harnessing the capabilities of transformers and integrating other position encoding techniques, this approach robustly capture intricate structural information within a graph. This results in a more comprehensive understanding of its underlying patterns (Chapter 6)

    Routing schemes for hybrid communication networks

    Get PDF
    We consider the problem of computing routing schemes in the HYBRID model of distributed computing where nodes have access to two fundamentally different communication modes. In this problem nodes have to compute small labels and routing tables that allow for efficient routing of messages in the local network, which typically offers the majority of the throughput. Recent work has shown that using the HYBRID model admits a significant speed-up compared to what would be possible if either communication mode were used in isolation. Nonetheless, if general graphs are used as the input graph the computation of routing schemes still takes polynomial rounds in the HYBRID model. We bypass this lower bound by restricting the local graph to unit-disc-graphs and solve the problem deterministically with running time O(|H|2+log⁡n), label size O(log⁡n), and size of routing tables O(|H|2⋅log⁡n) where |H| is the number of “radio holes” in the network. Our work builds on recent work by Coy et al., who obtain this result in the much simpler setting where the input graph has no radio holes. We develop new techniques to achieve this, including a decomposition of the local graph into path-convex regions, where each region contains a shortest path for any pair of nodes in it

    Machine learning-based characterisation of urban morphology with the street pattern

    Get PDF
    Streets are a crucial part of the built environment, and their layouts, the street patterns, are widely researched and contribute to a quantitative understanding of urban morphology. However, traditional street pattern analysis only considers a few broadly defined characteristics. It uses administrative boundaries and grids as units of analysis that fail to encompass the diversity and complexity of street networks. To address these challenges, this research proposes a machine learning-based approach to automatically recognise street patterns that employs an adaptive analysis unit based on street-based local areas (SLAs). SLAs use a network partitioning technique that can adapt to distinct street networks, making it particularly suitable for different urban contexts. By calculating several streets’ network metrics and performing a hierarchical clustering method, streets with similar characters are grouped under the same street pattern. A case study is carried out in six cities worldwide. The results show that street pattern types are rather diverse and hierarchical, and categorising them into clearly demarcated taxonomy is challenging. The study derives a set of new morphometrics-based street patterns with four major types that resemble conventional street patterns and eleven sub-types to significantly increase their diversity for broader coverage of urban morphology. The new patterns capture urban structural differences across cities, such as the urban-suburban division and the number of urban centres present. In conclusion, the proposed machine learning-based morphometric street pattern to characterise urban morphology has an enhanced ability to encompass more information from the built environment while maintaining the intuitiveness of using patterns

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model

    Full text link
    It is known that, for every k≄2k\geq 2, C2kC_{2k}-freeness can be decided by a generic Monte-Carlo algorithm running in n1−1/Θ(k2)n^{1-1/\Theta(k^2)} rounds in the CONGEST model. For 2≀k≀52\leq k\leq 5, faster Monte-Carlo algorithms do exist, running in O(n1−1/k)O(n^{1-1/k}) rounds, based on upper bounding the number of messages to be forwarded, and aborting search sub-routines for which this number exceeds certain thresholds. We investigate the possible extension of these threshold-based algorithms, for the detection of larger cycles. We first show that, for every k≄6k\geq 6, there exists an infinite family of graphs containing a 2k2k-cycle for which any threshold-based algorithm fails to detect that cycle. Hence, in particular, neither C12C_{12}-freeness nor C14C_{14}-freeness can be decided by threshold-based algorithms. Nevertheless, we show that {C12,C14}\{C_{12},C_{14}\}-freeness can still be decided by a threshold-based algorithm, running in O(n1−1/7)=O(n0.857
)O(n^{1-1/7})= O(n^{0.857\dots}) rounds, which is faster than using the generic algorithm, which would run in O(n1−1/22)≃O(n0.954
)O(n^{1-1/22})\simeq O(n^{0.954\dots}) rounds. Moreover, we exhibit an infinite collection of families of cycles such that threshold-based algorithms can decide F\mathcal{F}-freeness for every F\mathcal{F} in this collection.Comment: to be published in SIROCCO 202

    Reconfiguration of Digraph Homomorphisms

    Get PDF
    For a fixed graph H, the H-Recoloring problem asks whether, given two homomorphisms from a graph G to H, one homomorphism can be transformed into the other by changing the image of a single vertex in each step and maintaining a homomorphism to H throughout. The most general algorithmic result for H-Recoloring so far has been proposed by Wrochna in 2014, who introduced a topological approach to obtain a polynomial-time algorithm for any undirected loopless square-free graph H. We show that the topological approach can be used to recover essentially all previous algorithmic results for H-Recoloring and that it is applicable also in the more general setting of digraph homomorphisms. In particular, we show that H-Recoloring admits a polynomial-time algorithm i) if H is a loopless digraph that does not contain a 4-cycle of algebraic girth 0 and ii) if H is a reflexive digraph that contains no triangle of algebraic girth 1 and no 4-cycle of algebraic girth 0

    HyperSparse Neural Networks: Shifting Exploration to Exploitation through Adaptive Regularization

    Full text link
    Sparse neural networks are a key factor in developing resource-efficient machine learning applications. We propose the novel and powerful sparse learning method Adaptive Regularized Training (ART) to compress dense into sparse networks. Instead of the commonly used binary mask during training to reduce the number of model weights, we inherently shrink weights close to zero in an iterative manner with increasing weight regularization. Our method compresses the pre-trained model knowledge into the weights of highest magnitude. Therefore, we introduce a novel regularization loss named HyperSparse that exploits the highest weights while conserving the ability of weight exploration. Extensive experiments on CIFAR and TinyImageNet show that our method leads to notable performance gains compared to other sparsification methods, especially in extremely high sparsity regimes up to 99.8 percent model sparsity. Additional investigations provide new insights into the patterns that are encoded in weights with high magnitudes.Comment: ICCV'23 Workshop

    Quantum Alternating Operator Ansatz (QAOA) beyond low depth with gradually changing unitaries

    Full text link
    The Quantum Approximate Optimization Algorithm and its generalization to Quantum Alternating Operator Ansatz (QAOA) is a promising approach for applying quantum computers to challenging problems such as combinatorial optimization and computational chemistry. In this paper, we study the underlying mechanisms governing the behavior of QAOA circuits beyond shallow depth in the practically relevant setting of gradually varying unitaries. We use the discrete adiabatic theorem, which complements and generalizes the insights obtained from the continuous-time adiabatic theorem primarily considered in prior work. Our analysis explains some general properties that are conspicuously depicted in the recently introduced QAOA performance diagrams. For parameter sequences derived from continuous schedules (e.g. linear ramps), these diagrams capture the algorithm's performance over different parameter sizes and circuit depths. Surprisingly, they have been observed to be qualitatively similar across different performance metrics and application domains. Our analysis explains this behavior as well as entails some unexpected results, such as connections between the eigenstates of the cost and mixer QAOA Hamiltonians changing based on parameter size and the possibility of reducing circuit depth without sacrificing performance

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en BiotecnologĂ­a, IngenierĂ­a y TecnologĂ­a QuĂ­micaLĂ­nea de InvestigaciĂłn: IngenierĂ­a, Ciencia de Datos y BioinformĂĄticaClave Programa: DBICĂłdigo LĂ­nea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e InformĂĄtic
    • 

    corecore