6,744 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    Cross-domain interactions confer stability to benthic biofilms in proglacial streams

    Get PDF
    Cross-domain interactions are an integral part of the success of biofilms in natural environments but remain poorly understood. Here, we describe cross-domain interactions in stream biofilms draining proglacial floodplains in the Swiss Alps. These streams, as a consequence of the retreat of glaciers, are characterised by multiple environmental gradients and perturbations (e.g., changes in channel geomorphology, discharge) that depend on the time since deglaciation. We evaluate co-occurrence of bacteria and eukaryotic communities along streams and show that key community members have disproportionate effects on the stability of community networks. The topology of the networks, here quantified as the arrangement of the constituent nodes formed by specific taxa, was independent of stream type and their apparent environmental stability. However, network stability against fragmentation was higher in the streams draining proglacial terrain that was more recently deglaciated. We find that bacteria, eukaryotic photoautotrophs, and fungi are central to the stability of these networks, which fragment upon the removal of both pro- and eukaryotic taxa. Key taxa are not always abundant, suggesting an underlying functional component to their contributions. Thus, we show that there is a key role played by individual taxa in determining microbial community stability of glacier-fed streams

    Documenting Knowledge Graph Embedding and Link Prediction using Knowledge Graphs

    Get PDF
    In recent years, sub-symbolic learning, i.e., Knowledge Graph Embedding (KGE) incorporated with Knowledge Graphs (KGs) has gained significant attention in various downstream tasks (e.g., Link Prediction (LP)). These techniques learn a latent vector representation of KG's semantical structure to infer missing links. Nonetheless, the KGE models remain a black box, and the decision-making process behind them is not clear. Thus, the trustability and reliability of the model's outcomes have been challenged. While many state-of-the-art approaches provide data-driven frameworks to address these issues, they do not always provide a complete understanding, and the interpretations are not machine-readable. That is why, in this work, we extend a hybrid interpretable framework, InterpretME, in the field of the KGE models, especially for translation distance models, which include TransE, TransH, TransR, and TransD. The experimental evaluation on various benchmark KGs supports the validity of this approach, which we term Trace KGE. Trace KGE, in particular, contributes to increased interpretability and understanding of the perplexing KGE model's behavior

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Exponential-time approximation schemes via compression

    Get PDF
    In this paper, we give a framework to design exponential-time approximation schemes for basic graph partitioning problems such as k-way cut, Multiway Cut, Steiner k-cut and Multicut, where the goal is to minimize the number of edges going across the parts. Our motivation to focus on approximation schemes for these problems comes from the fact that while it is possible to solve them exactly in 2^nn^{

    Machine learning-based characterisation of urban morphology with the street pattern

    Get PDF
    Streets are a crucial part of the built environment, and their layouts, the street patterns, are widely researched and contribute to a quantitative understanding of urban morphology. However, traditional street pattern analysis only considers a few broadly defined characteristics. It uses administrative boundaries and grids as units of analysis that fail to encompass the diversity and complexity of street networks. To address these challenges, this research proposes a machine learning-based approach to automatically recognise street patterns that employs an adaptive analysis unit based on street-based local areas (SLAs). SLAs use a network partitioning technique that can adapt to distinct street networks, making it particularly suitable for different urban contexts. By calculating several streets’ network metrics and performing a hierarchical clustering method, streets with similar characters are grouped under the same street pattern. A case study is carried out in six cities worldwide. The results show that street pattern types are rather diverse and hierarchical, and categorising them into clearly demarcated taxonomy is challenging. The study derives a set of new morphometrics-based street patterns with four major types that resemble conventional street patterns and eleven sub-types to significantly increase their diversity for broader coverage of urban morphology. The new patterns capture urban structural differences across cities, such as the urban-suburban division and the number of urban centres present. In conclusion, the proposed machine learning-based morphometric street pattern to characterise urban morphology has an enhanced ability to encompass more information from the built environment while maintaining the intuitiveness of using patterns

    Cross-domain interactions confer stability to benthic biofilms in proglacial streams

    Get PDF
    Cross-domain interactions are an integral part of the success of biofilms in natural environments but remain poorly understood. Here, we describe cross-domain interactions in stream biofilms draining proglacial floodplains in the Swiss Alps. These streams, as a consequence of the retreat of glaciers, are characterised by multiple environmental gradients and perturbations (e.g., changes in channel geomorphology, discharge) that depend on the time since deglaciation. We evaluate co-occurrence of bacteria and eukaryotic communities along streams and show that key community members have disproportionate effects on the stability of community networks. The topology of the networks, here quantified as the arrangement of the constituent nodes formed by specific taxa, was independent of stream type and their apparent environmental stability. However, network stability against fragmentation was higher in the streams draining proglacial terrain that was more recently deglaciated. We find that bacteria, eukaryotic photoautotrophs, and fungi are central to the stability of these networks, which fragment upon the removal of both pro- and eukaryotic taxa. Key taxa are not always abundant, suggesting an underlying functional component to their contributions. Thus, we show that there is a key role played by individual taxa in determining microbial community stability of glacier-fed streams

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    On the Power of Threshold-Based Algorithms for Detecting Cycles in the CONGEST Model

    Full text link
    It is known that, for every k≥2k\geq 2, C2kC_{2k}-freeness can be decided by a generic Monte-Carlo algorithm running in n1−1/Θ(k2)n^{1-1/\Theta(k^2)} rounds in the CONGEST model. For 2≤k≤52\leq k\leq 5, faster Monte-Carlo algorithms do exist, running in O(n1−1/k)O(n^{1-1/k}) rounds, based on upper bounding the number of messages to be forwarded, and aborting search sub-routines for which this number exceeds certain thresholds. We investigate the possible extension of these threshold-based algorithms, for the detection of larger cycles. We first show that, for every k≥6k\geq 6, there exists an infinite family of graphs containing a 2k2k-cycle for which any threshold-based algorithm fails to detect that cycle. Hence, in particular, neither C12C_{12}-freeness nor C14C_{14}-freeness can be decided by threshold-based algorithms. Nevertheless, we show that {C12,C14}\{C_{12},C_{14}\}-freeness can still be decided by a threshold-based algorithm, running in O(n1−1/7)=O(n0.857…)O(n^{1-1/7})= O(n^{0.857\dots}) rounds, which is faster than using the generic algorithm, which would run in O(n1−1/22)≃O(n0.954…)O(n^{1-1/22})\simeq O(n^{0.954\dots}) rounds. Moreover, we exhibit an infinite collection of families of cycles such that threshold-based algorithms can decide F\mathcal{F}-freeness for every F\mathcal{F} in this collection.Comment: to be published in SIROCCO 202

    Solving graph problems with single-photons and linear optics

    Full text link
    An important challenge for current and near-term quantum devices is finding useful tasks that can be preformed on them. We first show how to efficiently encode a bounded n×nn \times n matrix AA into a linear optical circuit with 2n2n modes. We then apply this encoding to the case where AA is a matrix containing information about a graph GG. We show that a photonic quantum processor consisting of single-photon sources, a linear optical circuit encoding AA, and single-photon detectors can solve a range of graph problems including finding the number of perfect matchings of bipartite graphs, computing permanental polynomials, determining whether two graphs are isomorphic, and the kk-densest subgraph problem. We also propose pre-processing methods to boost the probabilities of observing the relevant detection events and thus improve performance. Finally, we present various numerical simulations which validate our findings.Comment: 6 pages + 9 pages appendix. Comments Welcome
    • …
    corecore