Search CORE

Apollo (Cambridge)

Springer - Publisher Connector

Automated group assignment in large phylogenetic trees using GRUNT: GRouping, Ungrouping, Naming Tool

Author: Andersen Gary L
Dalevi Daniel
DeSantis Todd Z
Fredslund Jakob
Hugenholtz Philip
Markowitz Victor M
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies. Results We have developed an automated iterative procedure for delineating stable (monophyletic) hierarchical groups to large (or small) trees and naming those groups according to a set of sequentially applied rules. In addition, we have created an associated ungrouping tool for removing existing groups that do not meet user-defined criteria (such as monophyly). The procedure is implemented in a program called GRUNT (GRouping, Ungrouping, Naming Tool) and has been applied to the current release of the Greengenes (Hugenholtz) 16S rRNA gene taxonomy comprising more than 130,000 taxa. Conclusion GRUNT will facilitate researchers requiring comprehensive hierarchical grouping of large tree topologies in, for example, database curation, microarray design and pangenome assignments. The application is available at the greengenes website <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

Directory of Open Access Journals

Estimating DNA coverage and abundance in metagenomes using a gamma approximation

Author: Amrita Pati
Angly
Brass
Breitbart
Chao
Chao
Chao
Chevreux
Dalevi
Daniel Dalevi
Dropkin
el-Shaarawi
Heath
Izsák
Kalyuzhnaya
Konstantinos Mavromatis
Kunin
Lander
Mavromatis
Natalia N. Ivanova
Nikos C. Kyrpides
Quail
Quince
Raes
Richter
Schloss
Sean D. Hooper
Simon
Stein
Tringe
Venter
Warnecke
Wendl
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets

Crossref

eScholarship - University of California

UNT Digital Library

ERBB3 is a marker of a ganglioneuroblastoma/ganglioneuroma-like expression profile in neuroblastic tumours

Author: Abel Frida
Dalevi Daniel
De Preter Katleen
Kogner Per
Kristiansson Erik
Krona Cecilie
Maris John
Nilsson Staffan
Stallings Raymond L.
Sveinbjørnsson Baldur
Versteeg Rogier
Wilzén Annika
Øra Ingrid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Neuroblastoma (NB) tumours are commonly divided into three cytogenetic subgroups. However, by unsupervised principal components analysis of gene expression profiles we recently identified four distinct subgroups, r1-r4. In the current study we characterized these different subgroups in more detail, with a specific focus on the fourth divergent tumour subgroup (r4). Methods: Expression microarray data from four international studies corresponding to 148 neuroblastic tumour cases were subject to division into four expression subgroups using a previously described 6-gene signature. Differentially expressed genes between groups were identified using Significance Analysis of Microarray (SAM). Next, gene expression network modelling was performed to map signalling pathways and cellular processes representing each subgroup. Findings were validated at the protein level by immunohistochemistry and immunoblot analyses. Results: We identified several significantly up-regulated genes in the r4 subgroup of which the tyrosine kinase receptor ERBB3 was most prominent (fold change: 132–240). By gene set enrichment analysis (GSEA) the constructed gene network of ERBB3 (n = 38 network partners) was significantly enriched in the r4 subgroup in all four independent data sets. ERBB3 was also positively correlated to the ErbB family members EGFR and ERBB2 in all data sets, and a concurrent overexpression was seen in the r4 subgroup. Further studies of histopathology categories using a fifth data set of 110 neuroblastic tumours, showed a striking similarity between the expression profile of r4 to ganglioneuroblastoma (GNB) and ganglioneuroma (GN) tumours. In contrast, the NB histopathological subtype was dominated by mitotic regulating genes, characterizing unfavourable NB subgroups in particular. The high ErbB3 expression in GN tumour types was verified at the protein level, and showed mainly expression in the mature ganglion cells. Conclusions: Conclusively, this study demonstrates the importance of performing unsupervised clustering and subtype discovery of data sets prior to analyses to avoid a mixture of tumour subtypes, which may otherwise give distorted results and lead to incorrect conclusions. The current study identifies ERBB3 as a clear-cut marker of a GNB/GN-like expression profile, and we suggest a 7-gene expression signature (including ERBB3) as a complement to histopathology analysis of neuroblastic tumours. Further studies of ErbB3 and other ErbB family members and their role in neuroblastic differentiation and pathogenesis are warranted

Lund University Publications

Ghent University Academic Bibliography

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

RCSI Repository

FigShare

Bridging the gap between systems biology and medicine

Author: Aronow Bruce J
Auffray Charles
Benson Mikael
Clermont Gilles
Dalevi Daniel
Dehne Frank
Dubhashi Devdatt
Langston Michael A
Marshall Dana R
Moreau Yves
Provero Paolo
Raasch Peter
Rocke David M
Tegner Jesper
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Systems biology has matured considerably as a discipline over the last decade, yet some of the key challenges separating current research efforts in systems biology and clinically useful results are only now becoming apparent. As these gaps are better defined, the new discipline of systems medicine is emerging as a translational extension of systems biology. How is systems medicine defined? What are relevant ontologies for systems medicine? What are the key theoretic and methodologic challenges facing computational disease modeling? How are inaccurate and incomplete data, and uncertain biologic knowledge best synthesized in useful computational models? Does network analysis provide clinically useful insight? We discuss the outstanding difficulties in translating a rapidly growing body of data into knowledge usable at the bedside. Although core-specific challenges are best met by specialized groups, it appears fundamental that such efforts should be guided by a roadmap for systems medicine drafted by a coalition of scientists from the clinical, experimental, computational, and theoretic domains

University of Tennessee, Knoxville: Trace

Publikationer från Linköpings universitet

Crossref

Carleton University's Institutional Repository

Springer - Publisher Connector

eScholarship - University of California

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach

Author: Altschul
Amrita Pati
Angelov
Bateman
Brewer
Brochier-Armanet
Broder
Choi
Daniel Dalevi
Elkins
Fontecave
Forterre
Hetzer
Iain J. Anderson
Konstantinos Mavromatis
Kumagai
Liolios
Makarova
Makarova
Marchler-Bauer
Markowitz
Nikos C. Kyrpides
Ogasahara
Paccanaro
Pellegrini
Rigden
Sean D. Hooper
Tatusov
Publication venue: Oxford University Press
Publication date
Field of study

In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online

Crossref

Inferring evolution in bacteria using Markov chains and genomic signatures

Author: Dalevi Daniel
Publication venue
Publication date: 01/01/2006
Field of study

This thesis concerns the development of methods and models in evolutionary molecular biology. The techniques are also applicable to other similar biological problems. The first contribution is a novel classifier using fixed and variable length Markov chains that can discriminate between bacterial DNA of different species. The classifier assumes that the composition of oligomers, DNA words, is species-specific and represents global features of the species, a so called genomic signature. The direct applications of such a classifier are: identification of horizontal gene transfer and binning of metagenomic data. The former has been the primary focus as it is one of the central processes in the evolution of bacteria. We suggest a new method for locking the number of parameters in a variable length Markov model and propose a method for rejecting false candidates of horizontal gene transfer events. The second contribution is a novel estimator for finding the prediction suffix tree of a variable length Markov chain. This new estimator is highly efficient in finding the correct state-space and we show that it compares favorably to a popular estimator in terms of the predictive likelihood.The third contribution is to the analysis of gene order rearrangements in bacteria. We recapitulate previous results on expected distances and derive new ones for cases that have recently gained support in the literature, such as symmetrical and short reversals. We also describe new categories of gene order patterns and show how these can be explained with models using short, symmetric and uniformly distributed transpositions and reversals.The forth contribution is a part of the Greengenes project which is a chimera free database of 16S rDNA sequences

Inferring evolution in bacteria using Markov chains and genomic signatures

Author: Dalevi Daniel
Publication venue
Publication date: 01/01/2006
Field of study

Publikationer från Örebro universitet

Expected Gene Order Distances and Model Selection in Bacteria

Author: Daniel Dalevi
Niklas Eriksen
Publication venue
Publication date: 01/01/2007
Field of study

The most parsimonous distances calculated in pairwise gene order comparisons cannot accurately reflect the true number of events separating two species, unless the number of changes are few. Better is to use the expected distances. In this study we recapitulate previous results and derive new expected distances for models that have gained support in other studies, such as, symmetrical reversal distances and short reversals. Further, we investigate the patterns of dotplots between species of bacteria with the purpose of model selection in gene order problems. We find several categories of data which can be explained by carefully weighing the contributions of reversals, transpositions, symmetric reversals, single gene transpositions, and single gene reversals.

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

The Peres-Shields Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity.

Author: Dalevi Daniel
Dubhashi Devdatt
Publication venue
Publication date: 01/01/2005
Field of study