12,023 research outputs found
Turning gold into 'junk': transposable elements utilize central proteins of cellular networks
The numerous discovered cases of domesticated
transposable element (TE) proteins led to the recognition
that TEs are a significant source of evolutionary
innovation. However, much less is known about
the reverse process, whether and to what degree
the evolution of TEs is influenced by the genome
of their hosts. We addressed this issue by searching
for cases of incorporation of host genes into the
sequence of TEs and examined the systems-level
properties of these genes using the Saccharomyces
cerevisiae and Drosophila melanogaster genomes.
We identified 51 cases where the evolutionary
scenario was the incorporation of a host gene
fragment into a TE consensus sequence, and we
show that both the yeast and fly homologues of
the incorporated protein sequences have central
positions in the cellular networks. An analysis of selective
pressure (Ka/Ks ratio) detected significant
selection in 37% of the cases. Recent research on
retrovirus-host interactions shows that virus
proteins preferentially target hubs of the host interaction
networks enabling them to take over the host
cell using only a few proteins. We propose that TEs
face a similar evolutionary pressure to evolve proteins
with high interacting capacities and take some
of the necessary protein domains directly from their
hosts
BUSCA: An integrative web server to predict subcellular localization of proteins
Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems
A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a
predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the
Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in
Computational Biology.Peer ReviewedPostprint (author's final draft
Bioinformatics: new tools and applications in life science and personalized medicine
While we have a basic understanding of the functioning of the gene when coding sequences of specific proteins, we feel the lack of information on the role that DNA has on specific diseases or functions of thousands of proteins that are produced. Bioinformatics combines the methods used in the collection, storage, identification, analysis, and correlation of this huge and complex information. All this work produces an “ocean” of information that can only be “sailed” with the help of computerized methods. The goal is to provide scientists with the right means to explain normal biological processes, dysfunctions of these processes which give rise to disease, and approaches that allow the discovery of new medical cures. Recently, sequencing platforms, a large scale of genomes and transcriptomes, have created new challenges not only to the genomics but especially for bioinformatics. The intent of this article is to compile a list of tools and information resources used by scientists to treat information from the massive sequencing of recent platforms to new generations and the applications of this information in different areas of life sciences including medicine.The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) and FEDER under Programme PT2020 for financial support to CIMO (UID/AGR/00690/2019).info:eu-repo/semantics/publishedVersio
- …