6,251 research outputs found
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Incorporating molecular data in fungal systematics: a guide for aspiring researchers
The last twenty years have witnessed molecular data emerge as a primary
research instrument in most branches of mycology. Fungal systematics, taxonomy,
and ecology have all seen tremendous progress and have undergone rapid,
far-reaching changes as disciplines in the wake of continual improvement in DNA
sequencing technology. A taxonomic study that draws from molecular data
involves a long series of steps, ranging from taxon sampling through the
various laboratory procedures and data analysis to the publication process. All
steps are important and influence the results and the way they are perceived by
the scientific community. The present paper provides a reflective overview of
all major steps in such a project with the purpose to assist research students
about to begin their first study using DNA-based methods. We also take the
opportunity to discuss the role of taxonomy in biology and the life sciences in
general in the light of molecular data. While the best way to learn molecular
methods is to work side by side with someone experienced, we hope that the
present paper will serve to lower the learning threshold for the reader.Comment: Submitted to Current Research in Environmental and Applied Mycology -
comments most welcom
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
CARE1, a TY3-gypsy long terminal repeat retrotransposon in the food legume chickpea (Cicer arietinum L)
We report a novel Ty3-gypsy long terminal repeat retrotransposon CARE1 (_Cicer arietinum_ retro-element 1) in chickpea. This 5920-bp AT-rich (63%) element carries 723-bp 5' and 897-bp 3' LTRs respectively flanking an internal region of 4300-bp. The LTRs of CARE1 show 93.9% nucleotide identity to each other and have 4-bp (ACTA) terminal inverted repeats. A 17-bp potential tRNAmet primer binding site downstream to 5' LTR and a 13-bp polypurine tract upstream to 3' LTR have been identified. The order of domains (Gag-proteinase-reverse transcriptase-RNaseH-integrase) in the deduced amino acid sequence and phylogenetic tree constructed using reverse transcriptase sequences places CARE1 in the gypsy group of retrotransposons. Homologues of a number of _cis_-elements including CCAAT, TATA and GT-1 have been detected in the regulatory region or the 5' LTR of CARE1. Transgenic tobacco plants containing 5' LTR:GUS construct show that its 5'-LTR is inactive in a heterologous system under normal as well as tissue culture conditions. Genomic Southern blot experiments using 5’LTR of the element as a probe show that CARE1 or its related elements are present in the genomes of various chickpea accessions from various geographic regions
High resolution crystal structure of the Endo-N-acetyl-beta-D-glucosaminidase responsible for the deglycosylation of hypocrea jecorina cellulases
Endo-N-acetyl-beta-D-glucosaminidases (ENGases) hydrolyze the glycosidic linkage between the two N-acetylglucosamine units that make up the chitobiose core of N-glycans. The endo-N-acetyl-beta-D-glucosaminidases classified into glycoside hydrolase family 18 are small, bacterial proteins with different substrate specificities. Recently two eukaryotic family 18 deglycosylating enzymes have been identified. Here, the expression, purification and the 1.3 angstrom resolution structure of the ENGase ( Endo T) from the mesophilic fungus Hypocrea jecorina (anamorph Trichoderma reesei) are reported. Although the mature protein is C-terminally processed with removal of a 46 amino acid peptide, the protein has a complete (beta/alpha) 8 TIM-barrel topology. In the active site, the proton donor (E131) and the residue stabilizing the transition state (D129) in the substrate assisted catalysis mechanism are found in almost identical positions as in the bacterial GH18 ENGases: Endo H, Endo F1, Endo F3, and Endo BT. However, the loops defining the substrate-binding cleft vary greatly from the previously known ENGase structures, and the structures also differ in some of the alpha-helices forming the barrel. This could reflect the variation in substrate specificity between the five enzymes. This is the first three-dimensional structure of a eukaryotic endo-N-acetyl-beta-D-glucosaminidase from glycoside hydrolase family 18. A glycosylation analysis of the cellulases secreted by a Hypocrea jecorina Endo T knock-out strain shows the in vivo function of the protein. A homology search and phylogenetic analysis show that the two known enzymes and their homologues form a large but separate cluster in subgroup B of the fungal chitinases. Therefore the future use of a uniform nomenclature is proposed
Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology
Genome-wide analysis of the emigrant family of MITEs: amplification dynamics and evolution of genes in Arabidopsis thaliana
MITEs are structurally similar to defective class II elements but
their high copy number and the size and sequence conservation of most
MITE families suggest that they can be amplified by a replicative
mechanism. Here we present a genome-wide analysis of the Emigrant
family of MITEs from Arabidopsis thaliana. In order to be able to
detect divergent ancient copies and low copy number subfamilies with a
different internal sequence we have developed a computer program
(http://www.lsi.upc.es/~alggen) that allows looking for Emigrant
elements based solely on its TIR sequence. Our results show that
different bursts of amplification of one or very few active, or
master, elements have occurred at different times during Arabidopsis
evolution, with an insertion dynamics similar to that of some
SINEs. The analysis of the insertion sites of the Emigrant elements
show that, although Emigrant elements tend to integrate far from ORFs,
the elements inserted within or close to genes are preferentially
maintained during evolution.Postprint (published version
BiologicalNetworks - tools enabling the integration of multi-scale data for the host-pathogen studies
<p>Abstract</p> <p>Background</p> <p>Understanding of immune response mechanisms of pathogen-infected host requires multi-scale analysis of genome-wide data. Data integration methods have proved useful to the study of biological processes in model organisms, but their systematic application to the study of host immune system response to a pathogen and human disease is still in the initial stage.</p> <p>Results</p> <p>To study host-pathogen interaction on the systems biology level, an extension to the previously described BiologicalNetworks system is proposed. The developed methods and data integration and querying tools allow simplifying and streamlining the process of integration of diverse experimental data types, including molecular interactions and phylogenetic classifications, genomic sequences and protein structure information, gene expression and virulence data for pathogen-related studies. The data can be integrated from the databases and user's files for both public and private use.</p> <p>Conclusions</p> <p>The developed system can be used for the systems-level analysis of host-pathogen interactions, including host molecular pathways that are induced/repressed during the infections, co-expressed genes, and conserved transcription factor binding sites. Previously unknown to be associated with the influenza infection genes were identified and suggested for further investigation as potential drug targets. Developed methods and data are available through the Java application (from BiologicalNetworks program at <url>http://www.biologicalnetworks.org</url>) and web interface (at <url>http://flu.sdsc.edu</url>).</p
- …