5,836 research outputs found

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases

    Get PDF
    Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Assis, Juliana. Fundación Oswaldo Cruz; BrasilFil: Gomes Araújo, Flávio M.. Fundación Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. Fundación Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Oliveira, Guilherme. Instituto Tecnológico Vale; Brasil. Fundación Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentin

    De novo assembly and characterization of leaf transcriptome for the development of functional molecular markers of the extremophile multipurpose tree species Prosopis alba

    Get PDF
    Background: Prosopis alba (Fabaceae) is an important native tree adapted to arid and semiarid regions of north-western Argentina which is of great value as multipurpose species. Despite its importance, the genomic resources currently available for the entire Prosopis genus are still limited. Here we describe the development of a leaf transcriptome and the identification of new molecular markers that could support functional genetic studies in natural and domesticated populations of this genus. Results: Next generation DNA pyrosequencing technology applied to P. alba transcripts produced a total of 1,103,231 raw reads with an average length of 421 bp. De novo assembling generated a set of 15,814 isotigs and 71,101 non-assembled sequences (singletons) with an average of 991 bp and 288 bp respectively. A total of 39,000 unique singletons were identified after clustering natural and artificial duplicates from pyrosequencing reads. Regarding the non-redundant sequences or unigenes, 22,095 out of 54,814 were successfully annotated with Gene Ontology terms. Moreover, simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 5,992 and 6,236 markers, respectively, throughout the genome. For the validation of the the predicted SSR markers, a subset of 87 SSRs selected through functional annotation evidence was successfully amplified from six DNA samples of seedlings. From this analysis, 11 of these 87 SSRs were identified as polymorphic. Additionally, another set of 123 nuclear polymorphic SSRs were determined in silico, of which 50% have the probability of being effectively polymorphic. Conclusions: This study generated a successful global analysis of the P. alba leaf transcriptome after bioinformatic and wet laboratory validations of RNA-Seq data. The limited set of molecular markers currently available will be significantly increased with the thousands of new markers that were identified in this study. This information will strongly contribute to genomics resources for P. alba functional analysis and genetics. Finally, it will also potentially contribute to the development of population-based genome studies in the genera.Fil: Torales, Susana. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pomponio, María Florencia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: González, Sergio Alberto. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Fernández, Paula del Carmen. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: López Lauenstein, Diego. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigaciones Agropecuarias. Instituto de Fisiología y Recursos Genéticos Vegetales; ArgentinaFil: Verga, Aníbal Ramón. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigaciones Agropecuarias. Instituto de Fisiología y Recursos Geneticos Vegetales; ArgentinaFil: Hopp, Horacio Esteban. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemí. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentin

    Motif Discovery in Protein Sequences

    Get PDF
    Biology has become a data‐intensive research field. Coping with the flood of data from the new genome sequencing technologies is a major area of research. The exponential increase in the size of the datasets produced by “next‐generation sequencing” (NGS) poses unique computational challenges. In this context, motif discovery tools are widely used to identify important patterns in the sequences produced. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They can occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important field in bioinformatics, and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences. This chapter will review the existing motif discovery methods for protein sequences and their ability to discover biologically important features as well as their limitations for the discovery of new motifs. Finally, we will propose new horizons for motif discovery in order to address the short comings of the existent methods

    Towards protein function annotations for matching remote homologs

    Get PDF
    Identifying functional similarities for proteins with low sequence identity and low structure similarity often suffers from high false positives and false negatives results. To improve the functional prediction ability based on the local protein structures, we proposed two different refinement and filtering approaches. We built a statistical model (known as Markov Random Field) to describe protein functional site structure. We also developed filters that consider the local environment around the active sites to remove the false positives. Our experimental results, as evaluated in five sets of enzyme families with less than 40% sequence identity, demonstrated that our methods can obtain more remote homologs that could not be detected by traditional sequence-based methods. At the same time, our method could reduce large amount of random matches. Our methods could improve up to 70% of the functional annotation ability (measured by their Area under the ROC curve) in extended motif method

    The Complex Evolutionary History of Aminoacyl-tRNA Synthetases

    Get PDF
    Aminoacyl-tRNA synthetases (AARSs) are a superfamily of enzymes responsible for the faithful translation of the genetic code and have lately become a prominent target for synthetic biologists. Our large-scale analysis of \u3e2500 prokaryotic genomes reveals the complex evolutionary history of these enzymes and their paralogs, in which horizontal gene transfer played an important role. These results show that a widespread belief in the evolutionary stability of this superfamily is misconceived. Although AlaRS, GlyRS, LeuRS, IleRS, ValRS are the most stable members of the family, GluRS, LysRS and CysRS often have paralogs, whereas AsnRS, GlnRS, PylRS and SepRS are often absent from many genomes. In the course of this analysis, highly conserved protein motifs and domains within each of the AARS loci were identified and used to build a web-based computational tool for the genome-wide detection of AARS coding sequences. This is based on hidden Markov models (HMMs) and is available together with a cognate database that may be used for specific analyses. The bioinformatics tools that we have developed may also help to identify new antibiotic agents and targets using these essential enzymes. These tools also may help to identify organisms with alternative pathways that are involved in maintaining the fidelity of the genetic code

    De novo transcriptome sequencing and SSR markers development for Cedrela balansae C. DC., a native tree species of northwest Argentina

    Get PDF
    The endangered Cedrela balansae C.DC. (Meliaceae) is a high-value timber species with great potential for forest plantations that inhabits the tropical forests in Northwestern Argentina. Research on this species is scarce because of the limited genetic and genomic information available. Here, we explored the transcriptome of C. balansae using 454 GS FLX Titanium next-generation sequencing (NGS) technology. Following de novo assembling, we identified 27,111 non-redundant unigenes longer than 200 bp, and considered these transcripts for further downstream analysis. The functional annotation was performed searching the 27,111 unigenes against the NR-Protein and the Interproscan databases. This analysis revealed 26,977 genes with homology in at least one of the Database analyzed. Furthermore, 7,774 unigenes in 142 different active biological pathways in C. balansae were identified with the KEGG database. Moreover, after in silico analyses, we detected 2,663 simple sequence repeats (SSRs) markers. A subset of 70 SSRs related to important “stress tolerance” traits based on functional annotation evidence, were selected for wet PCR-validation in C. balansae and other Cedrela species inhabiting in northwest and northeast of Argentina (C. fissilis, C. saltensis and C. angustifolia). Successful transferability was between 77% and 93% and thanks to this study, 32 polymorphic functional SSRs for all analyzed Cedrela species are now available. The gene catalog and molecular markers obtained here represent a starting point for further research, which will assist genetic breeding programs in the Cedrela genus and will contribute to identifying key populations for its preservation.Fil: Torales, Susana. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Gonzalez, Sergio. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Inza, María Virginia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Pomponio, María Florencia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Fernández, Paula. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Zelener, Noga. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Fornes, Luis Fernando. Instituto Nacional de Tecnología Agropecuaria. Centro Regional Tucuman-Santiago del Estero; ArgentinaFil: Hopp, Horacio Esteban. Universidad de Belgrano. Facultad de Ciencias Exactas y Naturales; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemí. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentin

    Transcriptome survey of Patagonian southern beech Nothofagus nervosa (= N. Alpina): assembly, annotation and molecular marker discovery

    Get PDF
    Nothofagus nervosa is one of the most emblematic native tree species of Patagonian temperate forests. Here, the shotgun RNA-sequencing (RNA-Seq) of the transcriptome of N. nervosa, including de novo assembly, functional annotation, and in silico discovery of potential molecular markers to support population and associations genetic studies, are described.Fil: Torales, Susana Leonor. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pomponio, Maria Florencia. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Recursos Biológicos; ArgentinaFil: Fernandez, Paula Del Carmen. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; ArgentinaFil: Marchelli, Paula Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Gonzalez, Sergio Alberto. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; ArgentinaFil: Azpilicueta, María M. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Hopp, Horacio Esteban. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Gallo, Leonardo A. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemi. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentin

    Automatic discovery of cross-family sequence features associated with protein function

    Get PDF
    BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription

    Topological network alignment uncovers biological function and phylogeny

    Full text link
    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
    corecore