Search CORE

5,836 research outputs found

Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Author: Bastien Olivier
Birkholtz Lyn-Marie
Breton Vincent
Grando Delphine
Hofmann-Apitius Martin
Jacq Nicolas
Joubert Fourie
Kasam Vinod
Louw Abraham I
Maréchal Eric
Ortet Philippe
Roy Sylvaine
Saïdani Nadia
Wells Gordon
Zimmermann Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

Hal - Université Grenoble Alpes

HAL AMU

Fraunhofer-ePrints

HAL Clermont Université

HAL Descartes

HAL-CEA

ProdInra

arXiv.org e-Print Archive

HAL-IN2P3

Springer - Publisher Connector

PubMed Central

UPSpace at the University of Pretoria

The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases

Author: A Bankevich
A Gurevich
A Lomsadze
A Lomsadze
Adolfo Fox
AM Bolger
Anna C. M. Salim
B Hendrich
B Langmead
C Bermudez-Santana
C Hahn
C Holt
C Jiang
C Trapnell
CA Alvarez Rojas
CCM Budke
D Kim
D Takai
DP McManus
DR Zerbino
E Elkayam
E Keibler
E Quevillon
F Jeanmougin
F Kiefer
F Mohn
Federico Camicia
Flávio M. Gomes Araújo
G Abrusán
G Parra
GSC Slater
Guilherme Oliveira
H Li
H Zheng
I Korf
IJ Tsai
IJ Tsai
J Eckert
JK Nono
JM Bart
JP Hewitson
Juliana Assis
K Arnold
K Matsuo
K Thivierge
K Wasik
KJ Fryxell
KK Geyer
KK Geyer
L Han
L Han
L Kamenetzky
L Kamenetzky
L Kamenetzky
L Li
LA Kelley
Laura Kamenetzky
LD Moore
Lucas L. Maldonado
M Ashburner
M Biasini
M Cucher
M Cucher
M Krzywinski
M Marín
M Nakao
M Nakao
M Nakao
M Nakao
M Nakao
M Rosenzvit
M Sajid
M Stanke
MA Cucher
Mara Rosenzvit
Marcela Cucher
MC Rosenzvit
MW Robinson
N Guex
N Macchiaroli
N Schürmann
Natalia Macchiaroli
ND Young
O Bogdanović
P Carninci
P Cingolani
P Danecek
PM Muzulin
PM Schantz
PS Craig
R Luo
R Schneider
RD Finn
RJ Klose
S Assefa
S Maillard
S Saxonov
S Yi
SF Altschul
SM Sadjjadi
TD Otto
TD Otto
TM Lowe
U Koziol
U Saarma
W Pan
Y Moriya
Y Safonova
YA Medvedeva
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Assis, Juliana. Fundación Oswaldo Cruz; BrasilFil: Gomes Araújo, Flávio M.. Fundación Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. Fundación Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Oliveira, Guilherme. Instituto Tecnológico Vale; Brasil. Fundación Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentin

De novo assembly and characterization of leaf transcriptome for the development of functional molecular markers of the extremophile multipurpose tree species Prosopis alba

Author: Acuña Cintia Vanesa
Fernández Paula del Carmen
González Sergio Alberto
Hopp Horacio Esteban
López Lauenstein Diego
Marcucci Poltri Susana Noemí
Paniego Norma Beatriz
Pomponio María Florencia
Rivarola Maximo Lisandro
Torales Susana
Verga Aníbal Ramón
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2013
Field of study

Background: Prosopis alba (Fabaceae) is an important native tree adapted to arid and semiarid regions of north-western Argentina which is of great value as multipurpose species. Despite its importance, the genomic resources currently available for the entire Prosopis genus are still limited. Here we describe the development of a leaf transcriptome and the identification of new molecular markers that could support functional genetic studies in natural and domesticated populations of this genus. Results: Next generation DNA pyrosequencing technology applied to P. alba transcripts produced a total of 1,103,231 raw reads with an average length of 421 bp. De novo assembling generated a set of 15,814 isotigs and 71,101 non-assembled sequences (singletons) with an average of 991 bp and 288 bp respectively. A total of 39,000 unique singletons were identified after clustering natural and artificial duplicates from pyrosequencing reads. Regarding the non-redundant sequences or unigenes, 22,095 out of 54,814 were successfully annotated with Gene Ontology terms. Moreover, simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 5,992 and 6,236 markers, respectively, throughout the genome. For the validation of the the predicted SSR markers, a subset of 87 SSRs selected through functional annotation evidence was successfully amplified from six DNA samples of seedlings. From this analysis, 11 of these 87 SSRs were identified as polymorphic. Additionally, another set of 123 nuclear polymorphic SSRs were determined in silico, of which 50% have the probability of being effectively polymorphic. Conclusions: This study generated a successful global analysis of the P. alba leaf transcriptome after bioinformatic and wet laboratory validations of RNA-Seq data. The limited set of molecular markers currently available will be significantly increased with the thousands of new markers that were identified in this study. This information will strongly contribute to genomics resources for P. alba functional analysis and genetics. Finally, it will also potentially contribute to the development of population-based genome studies in the genera.Fil: Torales, Susana. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pomponio, María Florencia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: González, Sergio Alberto. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Fernández, Paula del Carmen. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: López Lauenstein, Diego. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigaciones Agropecuarias. Instituto de Fisiología y Recursos Genéticos Vegetales; ArgentinaFil: Verga, Aníbal Ramón. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigaciones Agropecuarias. Instituto de Fisiología y Recursos Geneticos Vegetales; ArgentinaFil: Hopp, Horacio Esteban. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemí. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentin

CONICET Digital

Springer - Publisher Connector

PubMed Central

Repositorio Institucional – Biblioteca Digital

Motif Discovery in Protein Sequences

Author: Elloumi Mourad
Haj Mohamed Salma Aouled El
Thompson Julie D.
Publication venue: 'IntechOpen'
Publication date: 14/12/2016
Field of study

Biology has become a data‐intensive research field. Coping with the flood of data from the new genome sequencing technologies is a major area of research. The exponential increase in the size of the datasets produced by “next‐generation sequencing” (NGS) poses unique computational challenges. In this context, motif discovery tools are widely used to identify important patterns in the sequences produced. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They can occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important field in bioinformatics, and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences. This chapter will review the existing motif discovery methods for protein sequences and their ability to discover biologically important features as well as their limitations for the discovery of new motifs. Finally, we will propose new horizons for motif discovery in order to address the short comings of the existent methods

IntechOpen

Crossref

Towards protein function annotations for matching remote homologs

Author: Lei Seak Fei
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2008
Field of study

Identifying functional similarities for proteins with low sequence identity and low structure similarity often suffers from high false positives and false negatives results. To improve the functional prediction ability based on the local protein structures, we proposed two different refinement and filtering approaches. We built a statistical model (known as Markov Random Field) to describe protein functional site structure. We also developed filters that consider the local environment around the active sites to remove the false positives. Our experimental results, as evaluated in five sets of enzyme families with less than 40% sequence identity, demonstrated that our methods can obtain more remote homologs that could not be detected by traditional sequence-based methods. At the same time, our method could reduce large amount of random matches. Our methods could improve up to 70% of the functional annotation ability (measured by their Area under the ROC curve) in extended motif method

KU ScholarWorks

The Complex Evolutionary History of Aminoacyl-tRNA Synthetases

Author: Amoutzias Grigorios D.
Becker Hubert D.
Chaliotis Anargyros
Ibba Michael
Mossialos Dimitris
Stathopoulos Constantinos
Vlastaridis Panayotis
Publication venue: Chapman University Digital Commons
Publication date: 28/11/2016
Field of study

Aminoacyl-tRNA synthetases (AARSs) are a superfamily of enzymes responsible for the faithful translation of the genetic code and have lately become a prominent target for synthetic biologists. Our large-scale analysis of \u3e2500 prokaryotic genomes reveals the complex evolutionary history of these enzymes and their paralogs, in which horizontal gene transfer played an important role. These results show that a widespread belief in the evolutionary stability of this superfamily is misconceived. Although AlaRS, GlyRS, LeuRS, IleRS, ValRS are the most stable members of the family, GluRS, LysRS and CysRS often have paralogs, whereas AsnRS, GlnRS, PylRS and SepRS are often absent from many genomes. In the course of this analysis, highly conserved protein motifs and domains within each of the AARS loci were identified and used to build a web-based computational tool for the genome-wide detection of AARS coding sequences. This is based on hidden Markov models (HMMs) and is available together with a cognate database that may be used for specific analyses. The bioinformatics tools that we have developed may also help to identify new antibiotic agents and targets using these essential enzymes. These tools also may help to identify organisms with alternative pathways that are involved in maintaining the fidelity of the genetic code

PubMed Central

Chapman University Digital Commons

De novo transcriptome sequencing and SSR markers development for Cedrela balansae C. DC., a native tree species of northwest Argentina

Author: Acuña Cintia Vanesa
Fernández Paula
Fornes Luis Fernando
Gonzalez Sergio
Hopp Horacio Esteban
Inza María Virginia
Marcucci Poltri Susana Noemí
Paniego Norma Beatriz
Pomponio María Florencia
Rivarola Maximo Lisandro
Torales Susana
Zelener Noga
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

The endangered Cedrela balansae C.DC. (Meliaceae) is a high-value timber species with great potential for forest plantations that inhabits the tropical forests in Northwestern Argentina. Research on this species is scarce because of the limited genetic and genomic information available. Here, we explored the transcriptome of C. balansae using 454 GS FLX Titanium next-generation sequencing (NGS) technology. Following de novo assembling, we identified 27,111 non-redundant unigenes longer than 200 bp, and considered these transcripts for further downstream analysis. The functional annotation was performed searching the 27,111 unigenes against the NR-Protein and the Interproscan databases. This analysis revealed 26,977 genes with homology in at least one of the Database analyzed. Furthermore, 7,774 unigenes in 142 different active biological pathways in C. balansae were identified with the KEGG database. Moreover, after in silico analyses, we detected 2,663 simple sequence repeats (SSRs) markers. A subset of 70 SSRs related to important “stress tolerance” traits based on functional annotation evidence, were selected for wet PCR-validation in C. balansae and other Cedrela species inhabiting in northwest and northeast of Argentina (C. fissilis, C. saltensis and C. angustifolia). Successful transferability was between 77% and 93% and thanks to this study, 32 polymorphic functional SSRs for all analyzed Cedrela species are now available. The gene catalog and molecular markers obtained here represent a starting point for further research, which will assist genetic breeding programs in the Cedrela genus and will contribute to identifying key populations for its preservation.Fil: Torales, Susana. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Gonzalez, Sergio. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Inza, María Virginia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Pomponio, María Florencia. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Fernández, Paula. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Zelener, Noga. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación de Recursos Naturales. Instituto de Recursos Biológicos; ArgentinaFil: Fornes, Luis Fernando. Instituto Nacional de Tecnología Agropecuaria. Centro Regional Tucuman-Santiago del Estero; ArgentinaFil: Hopp, Horacio Esteban. Universidad de Belgrano. Facultad de Ciencias Exactas y Naturales; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemí. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentin

CONICET Digital

Directory of Open Access Journals

Repositorio Institucional – Biblioteca Digital

FigShare

Transcriptome survey of Patagonian southern beech Nothofagus nervosa (= N. Alpina): assembly, annotation and molecular marker discovery

Author: Acuña Cintia Vanesa
Azpilicueta Maria Marta
Fernandez Paula Del Carmen
Gallo Leonardo Ariel
Gonzalez Sergio
Hopp Horacio Esteban
Marchelli Paula
Marcucci Poltri Susana Noemi
Paniego Norma Beatriz
Pomponio Maria Florencia
Rivarola Maximo Lisandro
Torales Susana Leonor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Nothofagus nervosa is one of the most emblematic native tree species of Patagonian temperate forests. Here, the shotgun RNA-sequencing (RNA-Seq) of the transcriptome of N. nervosa, including de novo assembly, functional annotation, and in silico discovery of potential molecular markers to support population and associations genetic studies, are described.Fil: Torales, Susana Leonor. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Recursos Biológicos; ArgentinaFil: Rivarola, Maximo Lisandro. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pomponio, Maria Florencia. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Recursos Biológicos; ArgentinaFil: Fernandez, Paula Del Carmen. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Acuña, Cintia Vanesa. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; ArgentinaFil: Marchelli, Paula Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Gonzalez, Sergio Alberto. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; ArgentinaFil: Azpilicueta, María M. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Hopp, Horacio Esteban. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Gallo, Leonardo A. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bariloche. ArgentinaFil: Paniego, Norma Beatriz. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marcucci Poltri, Susana Noemi. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentin

Crossref

CONICET Digital

Springer - Publisher Connector

PubMed Central

Repositorio Institucional – Biblioteca Digital

Automatic discovery of cross-family sequence features associated with protein function

Author: Brameier Markus
Haan Josien
Krings Andrea
MacCallum Robert M
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Topological network alignment uncovers biological function and phylogeny

Author: Cook S.
Flannick J.
Kuchaiev O.
Kuchaiev O.
Memišević V.
Nataša Pržulj
Oleksii Kuchaiev
Pržulj N.
Singh R.
Singh R.
Snijders T. A.
Tijana Milenković
Vesna Memišević
Wayne Hayes
Wentz-Hunter K.
Zhang Y.
Publication venue
Publication date: 07/10/2009
Field of study

Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

arXiv.org e-Print Archive

Crossref

PubMed Central

UCL Discovery