17 research outputs found
Designing Efficient Spaced Seeds for SOLiD Read Mapping
The advent of high-throughput sequencing technologies constituted
a major advance in genomic studies, offering new prospects in a
wide range of applications.We propose a rigorous and flexible algorithmic
solution to mapping SOLiD color-space reads to a reference genome. The
solution relies on an advanced method of seed design that uses a faithful
probabilistic model of read matches and, on the other hand, a novel
seeding principle especially adapted to read mapping. Our method can
handle both lossy and lossless frameworks and is able to distinguish, at
the level of seed design, between SNPs and reading errors. We illustrate
our approach by several seed designs and demonstrate their efficiency
Joint Analysis of Multiple Metagenomic Samples
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed “binning”) algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough
Calculo del clique-width en graficas simples de acuerdo a su estructura
El cálculo del cliquewidth, un número entero que es un invariante para gráficas, ha sido estudiado de manera activa, ya que existen problemas catalogados como NP-Completos que tienen complejidad baja si su representación en gráficas tiene cliquewidth acotado. De cierta manera este parametro mide la dificultad de descomponer una gráfica en una estructura llamada árbol (por su topología). La importancia de este invariante radica en que si un problema de gráficas puede ser acotado por ella entonces puede ser resuelto en tiempo polinomial según el teorema principal de Courcelle. Por otra parte el cliquewidth tiene una relación directa con el invariante tree-width con la distinción de que el primero es más general que el segundo. Para calcular este tipo de invariantes se han propuesto en la literatura diferentes procedimientos que dividen la gráfica original en subgráficas las cuales determinan la complejidad, por lo que en la investigación aquí reportada se ha utilizado una descomposición particular de una gráfica simple, la cual consiste en descomponer la gráfica en ciclos simples y árboles. Las gráficas que consisten de ciclos simples y árboles se denominan cactus, sobre las cuales hemos demostrado que el clique-width es menor o igual a 4 lo que mejora la cota establecida por la relación entre el clique-width y el invariante treewidth la cual establece que el cwd(G) ≤ 3·2twd(G)−1. De igual manera se han estudiado otro tipo de gráficas denominadas poligonales, formadas por polígonos con mismo número de lados los cuales comparten entre si una única arista; sobre este tipo de gráficas en esta investigación se ha demostrado que el cliquewidth es igual a 5, de igual manera mejorando la cota conocida por la relación de las invariantes mencionadas anteriormente. Finalmente, estudiando el comportamiento de operaciones de union de estas subgráficas se ha propuesto un método de aproximación para el cálculo del cliquewidth de una gráfica simple de manera general. El algoritmo esta basado en el clásico algoritmo de Disjktra que encuentra el camino mas corto entre dos vértices de una gráfica. Del planteamiento de los algoritmos mencionados anteriormente se obtuvo la publicación de tres artículos, en los que se incluye el desarrollo de las demostraciones para el cálculo del clique-width en los diferentes escenarios de estudio.CONACy
Improved Differentially Private Densest Subgraph: Local and Purely Additive
We study the Densest Subgraph problem under the additional constraint of
differential privacy. In the LEDP (local edge differential privacy) model,
introduced recently by Dhulipala et al. [FOCS 2022], we give an -differentially private algorithm with no multiplicative loss: the loss
is purely additive. This is in contrast to every previous private algorithm for
densest subgraph (local or centralized), all of which incur some multiplicative
loss as well as some additive loss. Moreover, our additive loss matches the
best-known previous additive loss (in any version of differential privacy) when
is at least polynomial in , and in the centralized setting we can
strengthen our result to provide better than the best-known additive loss.
Additionally, we give a different algorithm that is -differentially
private in the LEDP model which achieves a multiplicative ratio arbitrarily
close to , along with an additional additive factor. This improves over the
previous multiplicative -approximation in the LEDP model. Finally, we
conclude with extensions of our techniques to both the node-weighted and the
directed versions of the problem.Comment: 41 page
Metagenomics and functional genomics of bacterial symbionts of Spongia (Porifera, Dictyoceratida) specimens from the Algarvian shore (South Portugal)
Sponges are early-branched, filter-feeding metazoans that usually harbor complex microbial
communities comprised of diverse “uncultivable” symbiotic bacteria. In this thesis, the
functional and taxonomic features of the marine sponge microbiome are determined, using
Spongia officinalis as model host organism. Emphasis is given to adaptive and functional traits
of the profuse and biotechnologically-relevant alphaproteobacterial symbionts of sponges. A
metagenomics-centred approach was employed to reveal microbial taxa and genomic
signatures enriched in the Spongia officinalis endosymbiotic consortium, and thus likely to
play pivotal roles in holobiont functioning. Further, a comparative genomics study is presented
unveiling the common and specific traits of ten Alphaproteobacteria genera isolated from S.
officinalis with alternative symbiont cultivation methodology. Finally, a sequence
composition-dependent binning approach is employed to assemble, from metagenomic
sequences, the genome of an uncultured alphaproteobacterial symbiont of S. officinalis
belonging to the family Rhodospirillaceae.
High abundance of polyketide and terpene synthase-, eukaryotic-like protein- (ELPs),
type IV secretion system-, plasmid- and ABC transporter-encoding genes, among others,
characterized the sponge microbial metagenomes. In contrast, motility and chemotaxis genes
were abundant in seawater and sediment microbiomes, but nearly absent in the S. officinalis
symbiotic consortium. Much higher frequencies of anti-viral CRISPR-Cas and restrictionmodification
systems, along with much lower viral abundances, were observed in the spongeassociated
metagenomes than in the environment and interpreted as true hallmarks of this
symbiotic consortium.
In line with outcomes retrieved for the whole symbiotic community,
alphaproteobacterial symbionts of marine sponges likely contribute the most to host fitness
through nutritional exchange, cell detoxification processes and chemical defense, the latter
being theoretically promoted by both polyketide and terpenoid biosynthesis. The several
alphaproteobacterial cultures retrieved in this thesis, displaying high natural product
biosynthesis capacities, can now be explored in studies aiming at revealing novel biological
activities and chemical structures from these symbionts.As esponjas marinhas (filo Porifera) são consideradas um dos mais simples grupos entre os
metazoários em função de sua falta de organização em tecidos e órgãos verdadeiros. Porém,
estes animais relativamente simples em termos de plano corporal normalmente abrigam
comunidades muito complexas de microorganismos. Em função de seu surgimento basal na
história evolutiva do planeta, o conhecimento a respeito deste “holobionte”, isto é, o consórcio
de organismos formado pela esponja marinha hospedeira e todos os seus simbiontes
microbianos, possui grande relavância ao avanço da nossa compreensão sobre as interações
hospedeiro-microorganismos. Nesta tese de doutoramento, tive como objetivo a determinação
das características funcionais e taxonómicas do microbioma das esponjas marinhas no contexto
de seu ambiente circundante (água e sedimentos marinhos, e suas respecticvas microbiotas),
dando ênfase aos traços adaptativos e funcionais de alfaproteobactérias associadas ao
organismo modelo Spongia officinalis (“bath sponge”)