506 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Mining Butterflies in Streaming Graphs

    Get PDF
    This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection. sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals

    Dynamic processes on networks and higher-order structures

    Get PDF
    Higher-order interactions are increasingly recognized as a critical aspect in the modeling of complex systems. Higher-order networks provide a framework for studying the relationship between the structure of higher-order interactions and the function of the complex system. However, little is known about how higher-order interactions affect dynamic processes. In this thesis, we develop general frameworks of percolation aiming at understanding the interplay between higher-order network structures and the critical properties of dynamics. We reveal that degree correlations strongly affect the percolation threshold on higher-order networks and interestingly, the effect of correlations is different on ordinary percolation and higher-order percolation. We further elucidate the mechanisms responsible for the emergence of discontinuous transitions on higher-order networks. Moreover, we show that triadic regulatory interaction, as a general type of higher-order interaction found widely in nature, can turn percolation into a fully-fledged dynamic process that exhibits period doubling and a route to chaos. As an important example of dynamic processes, we further investigate the role of network topology on epidemic spreading. We show that higher-order interactions can induce a non-linear infection kernel in a pandemic, which results in a discontinuous phase transition, hysteresis, and superexponential spreading. Finally, we propose an epidemic model to evaluate the role of automated contact-and-tracing with mobile apps as a new containment measure to mitigate a pandemic. We reveal the non-linear effect on the reduction of the incidence provided by a certain fraction of app adoption in the population and we propose the optimal strategy to mitigate the pandemic with limited resources. Altogether, the thesis provides new insights into the interplay between the topology of higher-order networks and their dynamics. The results obtained may shed light on the research in other areas of interest such as brain functions and epidemic spreading

    Algorithms and Certificates for Boolean CSP Refutation: "Smoothed is no harder than Random"

    Full text link
    We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an nn-variable smoothed instance of a kk-arity CSP, our algorithm runs in nO(ℓ)n^{O(\ell)} time, and succeeds with high probability in bounding the optimum fraction of satisfiable constraints away from 11, provided that the number of constraints is at least O~(n)(nℓ)k2−1\tilde{O}(n) (\frac{n}{\ell})^{\frac{k}{2} - 1}. This matches, up to polylogarithmic factors in nn, the trade-off between running time and the number of constraints of the state-of-the-art algorithms for refuting fully random instances of CSPs [RRS17]. We also make a surprising new connection between our algorithm and even covers in hypergraphs, which we use to positively resolve Feige's 2008 conjecture, an extremal combinatorics conjecture on the existence of even covers in sufficiently dense hypergraphs that generalizes the well-known Moore bound for the girth of graphs. As a corollary, we show that polynomial-size refutation witnesses exist for arbitrary smoothed CSP instances with number of constraints a polynomial factor below the "spectral threshold" of nk/2n^{k/2}, extending the celebrated result for random 3-SAT of Feige, Kim and Ofek [FKO06]

    Parameterized Graph Modification Beyond the Natural Parameter

    Get PDF

    Sum-of-squares representations for copositive matrices and independent sets in graphs

    Get PDF
    A polynomial optimization problem asks for minimizing a polynomial function (cost) given a set of constraints (rules) represented by polynomial inequalities and equations. Many hard problems in combinatorial optimization and applications in operations research can be naturally encoded as polynomial optimization problems. A common approach for addressing such computationally hard problems is by considering variations of the original problem that give an approximate solution, and that can be solved efficiently. One such approach for attacking hard combinatorial problems and, more generally, polynomial optimization problems, is given by the so-called sum-of-squares approximations. This thesis focuses on studying whether these approximations find the optimal solution of the original problem.We investigate this question in two main settings: 1) Copositive programs and 2) parameters dealing with independent sets in graphs. Among our main new results, we characterize the matrix sizes for which sum-of-squares approximations are able to capture all copositive matrices. In addition, we show finite convergence of the sums-of-squares approximations for maximum independent sets in graphs based on their continuous copositive reformulations. We also study sum-of-squares approximations for parameters asking for maximum balanced independent sets in bipartite graphs. In particular, we find connections with the Lovász theta number and we design eigenvalue bounds for several related parameters when the graphs satisfy some symmetry properties.<br/

    On powers of Hamilton cycles in Ramsey-Tur\'{a}n Theory

    Full text link
    We prove that for r∈Nr\in \mathbb{N} with r≥2r\geq 2 and μ>0\mu>0, there exist α>0\alpha>0 and n0n_{0} such that for every n≥n0n\geq n_{0}, every nn-vertex graph GG with δ(G)≥(1−1r+μ)n\delta(G)\geq \left(1-\frac{1}{r}+\mu\right)n and α(G)≤αn\alpha(G)\leq \alpha n contains an rr-th power of a Hamilton cycle. We also show that the minimum degree condition is asymptotically sharp for r=2,3r=2, 3 and the r=2r=2 case was recently conjectured by Staden and Treglown.Comment: 19 pages, 4 figure

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum
    • …
    corecore