6,251 research outputs found

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    Incorporating molecular data in fungal systematics: a guide for aspiring researchers

    Full text link
    The last twenty years have witnessed molecular data emerge as a primary research instrument in most branches of mycology. Fungal systematics, taxonomy, and ecology have all seen tremendous progress and have undergone rapid, far-reaching changes as disciplines in the wake of continual improvement in DNA sequencing technology. A taxonomic study that draws from molecular data involves a long series of steps, ranging from taxon sampling through the various laboratory procedures and data analysis to the publication process. All steps are important and influence the results and the way they are perceived by the scientific community. The present paper provides a reflective overview of all major steps in such a project with the purpose to assist research students about to begin their first study using DNA-based methods. We also take the opportunity to discuss the role of taxonomy in biology and the life sciences in general in the light of molecular data. While the best way to learn molecular methods is to work side by side with someone experienced, we hope that the present paper will serve to lower the learning threshold for the reader.Comment: Submitted to Current Research in Environmental and Applied Mycology - comments most welcom

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    CARE1, a TY3-gypsy long terminal repeat retrotransposon in the food legume chickpea (Cicer arietinum L)

    Get PDF
    We report a novel Ty3-gypsy long terminal repeat retrotransposon CARE1 (_Cicer arietinum_ retro-element 1) in chickpea. This 5920-bp AT-rich (63%) element carries 723-bp 5' and 897-bp 3' LTRs respectively flanking an internal region of 4300-bp. The LTRs of CARE1 show 93.9% nucleotide identity to each other and have 4-bp (ACTA) terminal inverted repeats. A 17-bp potential tRNAmet primer binding site downstream to 5' LTR and a 13-bp polypurine tract upstream to 3' LTR have been identified. The order of domains (Gag-proteinase-reverse transcriptase-RNaseH-integrase) in the deduced amino acid sequence and phylogenetic tree constructed using reverse transcriptase sequences places CARE1 in the gypsy group of retrotransposons. Homologues of a number of _cis_-elements including CCAAT, TATA and GT-1 have been detected in the regulatory region or the 5' LTR of CARE1. Transgenic tobacco plants containing 5' LTR:GUS construct show that its 5'-LTR is inactive in a heterologous system under normal as well as tissue culture conditions. Genomic Southern blot experiments using 5’LTR of the element as a probe show that CARE1 or its related elements are present in the genomes of various chickpea accessions from various geographic regions

    High resolution crystal structure of the Endo-N-acetyl-beta-D-glucosaminidase responsible for the deglycosylation of hypocrea jecorina cellulases

    Get PDF
    Endo-N-acetyl-beta-D-glucosaminidases (ENGases) hydrolyze the glycosidic linkage between the two N-acetylglucosamine units that make up the chitobiose core of N-glycans. The endo-N-acetyl-beta-D-glucosaminidases classified into glycoside hydrolase family 18 are small, bacterial proteins with different substrate specificities. Recently two eukaryotic family 18 deglycosylating enzymes have been identified. Here, the expression, purification and the 1.3 angstrom resolution structure of the ENGase ( Endo T) from the mesophilic fungus Hypocrea jecorina (anamorph Trichoderma reesei) are reported. Although the mature protein is C-terminally processed with removal of a 46 amino acid peptide, the protein has a complete (beta/alpha) 8 TIM-barrel topology. In the active site, the proton donor (E131) and the residue stabilizing the transition state (D129) in the substrate assisted catalysis mechanism are found in almost identical positions as in the bacterial GH18 ENGases: Endo H, Endo F1, Endo F3, and Endo BT. However, the loops defining the substrate-binding cleft vary greatly from the previously known ENGase structures, and the structures also differ in some of the alpha-helices forming the barrel. This could reflect the variation in substrate specificity between the five enzymes. This is the first three-dimensional structure of a eukaryotic endo-N-acetyl-beta-D-glucosaminidase from glycoside hydrolase family 18. A glycosylation analysis of the cellulases secreted by a Hypocrea jecorina Endo T knock-out strain shows the in vivo function of the protein. A homology search and phylogenetic analysis show that the two known enzymes and their homologues form a large but separate cluster in subgroup B of the fungal chitinases. Therefore the future use of a uniform nomenclature is proposed

    Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology

    Get PDF
    The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology

    Genome-wide analysis of the emigrant family of MITEs: amplification dynamics and evolution of genes in Arabidopsis thaliana

    Get PDF
    MITEs are structurally similar to defective class II elements but their high copy number and the size and sequence conservation of most MITE families suggest that they can be amplified by a replicative mechanism. Here we present a genome-wide analysis of the Emigrant family of MITEs from Arabidopsis thaliana. In order to be able to detect divergent ancient copies and low copy number subfamilies with a different internal sequence we have developed a computer program (http://www.lsi.upc.es/~alggen) that allows looking for Emigrant elements based solely on its TIR sequence. Our results show that different bursts of amplification of one or very few active, or master, elements have occurred at different times during Arabidopsis evolution, with an insertion dynamics similar to that of some SINEs. The analysis of the insertion sites of the Emigrant elements show that, although Emigrant elements tend to integrate far from ORFs, the elements inserted within or close to genes are preferentially maintained during evolution.Postprint (published version

    The study of plant genome evolution by means of phylogenomics

    Get PDF

    BiologicalNetworks - tools enabling the integration of multi-scale data for the host-pathogen studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding of immune response mechanisms of pathogen-infected host requires multi-scale analysis of genome-wide data. Data integration methods have proved useful to the study of biological processes in model organisms, but their systematic application to the study of host immune system response to a pathogen and human disease is still in the initial stage.</p> <p>Results</p> <p>To study host-pathogen interaction on the systems biology level, an extension to the previously described BiologicalNetworks system is proposed. The developed methods and data integration and querying tools allow simplifying and streamlining the process of integration of diverse experimental data types, including molecular interactions and phylogenetic classifications, genomic sequences and protein structure information, gene expression and virulence data for pathogen-related studies. The data can be integrated from the databases and user's files for both public and private use.</p> <p>Conclusions</p> <p>The developed system can be used for the systems-level analysis of host-pathogen interactions, including host molecular pathways that are induced/repressed during the infections, co-expressed genes, and conserved transcription factor binding sites. Previously unknown to be associated with the influenza infection genes were identified and suggested for further investigation as potential drug targets. Developed methods and data are available through the Java application (from BiologicalNetworks program at <url>http://www.biologicalnetworks.org</url>) and web interface (at <url>http://flu.sdsc.edu</url>).</p
    corecore