17 research outputs found

    Designing Efficient Spaced Seeds for SOLiD Read Mapping

    Get PDF
    The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the other hand, a novel seeding principle especially adapted to read mapping. Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. We illustrate our approach by several seed designs and demonstrate their efficiency

    Joint Analysis of Multiple Metagenomic Samples

    Get PDF
    The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed “binning”) algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough

    Calculo del clique-width en graficas simples de acuerdo a su estructura

    Get PDF
    El cálculo del cliquewidth, un número entero que es un invariante para gráficas, ha sido estudiado de manera activa, ya que existen problemas catalogados como NP-Completos que tienen complejidad baja si su representación en gráficas tiene cliquewidth acotado. De cierta manera este parametro mide la dificultad de descomponer una gráfica en una estructura llamada árbol (por su topología). La importancia de este invariante radica en que si un problema de gráficas puede ser acotado por ella entonces puede ser resuelto en tiempo polinomial según el teorema principal de Courcelle. Por otra parte el cliquewidth tiene una relación directa con el invariante tree-width con la distinción de que el primero es más general que el segundo. Para calcular este tipo de invariantes se han propuesto en la literatura diferentes procedimientos que dividen la gráfica original en subgráficas las cuales determinan la complejidad, por lo que en la investigación aquí reportada se ha utilizado una descomposición particular de una gráfica simple, la cual consiste en descomponer la gráfica en ciclos simples y árboles. Las gráficas que consisten de ciclos simples y árboles se denominan cactus, sobre las cuales hemos demostrado que el clique-width es menor o igual a 4 lo que mejora la cota establecida por la relación entre el clique-width y el invariante treewidth la cual establece que el cwd(G) ≤ 3·2twd(G)−1. De igual manera se han estudiado otro tipo de gráficas denominadas poligonales, formadas por polígonos con mismo número de lados los cuales comparten entre si una única arista; sobre este tipo de gráficas en esta investigación se ha demostrado que el cliquewidth es igual a 5, de igual manera mejorando la cota conocida por la relación de las invariantes mencionadas anteriormente. Finalmente, estudiando el comportamiento de operaciones de union de estas subgráficas se ha propuesto un método de aproximación para el cálculo del cliquewidth de una gráfica simple de manera general. El algoritmo esta basado en el clásico algoritmo de Disjktra que encuentra el camino mas corto entre dos vértices de una gráfica. Del planteamiento de los algoritmos mencionados anteriormente se obtuvo la publicación de tres artículos, en los que se incluye el desarrollo de las demostraciones para el cálculo del clique-width en los diferentes escenarios de estudio.CONACy

    Fifth Biennial Report : June 1999 - August 2001

    No full text

    Sixth Biennial Report : August 2001 - May 2003

    No full text

    Improved Differentially Private Densest Subgraph: Local and Purely Additive

    Full text link
    We study the Densest Subgraph problem under the additional constraint of differential privacy. In the LEDP (local edge differential privacy) model, introduced recently by Dhulipala et al. [FOCS 2022], we give an (ϵ,δ)(\epsilon, \delta)-differentially private algorithm with no multiplicative loss: the loss is purely additive. This is in contrast to every previous private algorithm for densest subgraph (local or centralized), all of which incur some multiplicative loss as well as some additive loss. Moreover, our additive loss matches the best-known previous additive loss (in any version of differential privacy) when 1/δ1/\delta is at least polynomial in nn, and in the centralized setting we can strengthen our result to provide better than the best-known additive loss. Additionally, we give a different algorithm that is ϵ\epsilon-differentially private in the LEDP model which achieves a multiplicative ratio arbitrarily close to 22, along with an additional additive factor. This improves over the previous multiplicative 44-approximation in the LEDP model. Finally, we conclude with extensions of our techniques to both the node-weighted and the directed versions of the problem.Comment: 41 page

    Metagenomics and functional genomics of bacterial symbionts of Spongia (Porifera, Dictyoceratida) specimens from the Algarvian shore (South Portugal)

    Get PDF
    Sponges are early-branched, filter-feeding metazoans that usually harbor complex microbial communities comprised of diverse “uncultivable” symbiotic bacteria. In this thesis, the functional and taxonomic features of the marine sponge microbiome are determined, using Spongia officinalis as model host organism. Emphasis is given to adaptive and functional traits of the profuse and biotechnologically-relevant alphaproteobacterial symbionts of sponges. A metagenomics-centred approach was employed to reveal microbial taxa and genomic signatures enriched in the Spongia officinalis endosymbiotic consortium, and thus likely to play pivotal roles in holobiont functioning. Further, a comparative genomics study is presented unveiling the common and specific traits of ten Alphaproteobacteria genera isolated from S. officinalis with alternative symbiont cultivation methodology. Finally, a sequence composition-dependent binning approach is employed to assemble, from metagenomic sequences, the genome of an uncultured alphaproteobacterial symbiont of S. officinalis belonging to the family Rhodospirillaceae. High abundance of polyketide and terpene synthase-, eukaryotic-like protein- (ELPs), type IV secretion system-, plasmid- and ABC transporter-encoding genes, among others, characterized the sponge microbial metagenomes. In contrast, motility and chemotaxis genes were abundant in seawater and sediment microbiomes, but nearly absent in the S. officinalis symbiotic consortium. Much higher frequencies of anti-viral CRISPR-Cas and restrictionmodification systems, along with much lower viral abundances, were observed in the spongeassociated metagenomes than in the environment and interpreted as true hallmarks of this symbiotic consortium. In line with outcomes retrieved for the whole symbiotic community, alphaproteobacterial symbionts of marine sponges likely contribute the most to host fitness through nutritional exchange, cell detoxification processes and chemical defense, the latter being theoretically promoted by both polyketide and terpenoid biosynthesis. The several alphaproteobacterial cultures retrieved in this thesis, displaying high natural product biosynthesis capacities, can now be explored in studies aiming at revealing novel biological activities and chemical structures from these symbionts.As esponjas marinhas (filo Porifera) são consideradas um dos mais simples grupos entre os metazoários em função de sua falta de organização em tecidos e órgãos verdadeiros. Porém, estes animais relativamente simples em termos de plano corporal normalmente abrigam comunidades muito complexas de microorganismos. Em função de seu surgimento basal na história evolutiva do planeta, o conhecimento a respeito deste “holobionte”, isto é, o consórcio de organismos formado pela esponja marinha hospedeira e todos os seus simbiontes microbianos, possui grande relavância ao avanço da nossa compreensão sobre as interações hospedeiro-microorganismos. Nesta tese de doutoramento, tive como objetivo a determinação das características funcionais e taxonómicas do microbioma das esponjas marinhas no contexto de seu ambiente circundante (água e sedimentos marinhos, e suas respecticvas microbiotas), dando ênfase aos traços adaptativos e funcionais de alfaproteobactérias associadas ao organismo modelo Spongia officinalis (“bath sponge”)
    corecore