29,734 research outputs found
Cancer3D: understanding cancer mutations through protein structures.
The new era of cancer genomics is providing us with extensive knowledge of mutations and other alterations in cancer. The Cancer3D database at http://www.cancer3d.org gives an open and user-friendly way to analyze cancer missense mutations in the context of structures of proteins in which they are found. The database also helps users analyze the distribution patterns of the mutations as well as their relationship to changes in drug activity through two algorithms: e-Driver and e-Drug. These algorithms use knowledge of modular structure of genes and proteins to separately study each region. This approach allows users to find novel candidate driver regions or drug biomarkers that cannot be found when similar analyses are done on the whole-gene level. The Cancer3D database provides access to the results of such analyses based on data from The Cancer Genome Atlas (TCGA) and the Cancer Cell Line Encyclopedia (CCLE). In addition, it displays mutations from over 14,700 proteins mapped to more than 24,300 structures from PDB. This helps users visualize the distribution of mutations and identify novel three-dimensional patterns in their distribution
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
While neuroevolution (evolving neural networks) has a successful track record
across a variety of domains from reinforcement learning to artificial life, it
is rarely applied to large, deep neural networks. A central reason is that
while random mutation generally works in low dimensions, a random perturbation
of thousands or millions of weights is likely to break existing functionality,
providing no learning signal even if some individual weight changes were
beneficial. This paper proposes a solution by introducing a family of safe
mutation (SM) operators that aim within the mutation operator itself to find a
degree of change that does not alter network behavior too much, but still
facilitates exploration. Importantly, these SM operators do not require any
additional interactions with the environment. The most effective SM variant
capitalizes on the intriguing opportunity to scale the degree of mutation of
each individual weight according to the sensitivity of the network's outputs to
that weight, which requires computing the gradient of outputs with respect to
the weights (instead of the gradient of error, as in conventional deep
learning). This safe mutation through gradients (SM-G) operator dramatically
increases the ability of a simple genetic algorithm-based neuroevolution method
to find solutions in high-dimensional domains that require deep and/or
recurrent neural networks (which tend to be particularly brittle to mutation),
including domains that require processing raw pixels. By improving our ability
to evolve deep neural networks, this new safer approach to mutation expands the
scope of domains amenable to neuroevolution
Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems
A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a
predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the
Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in
Computational Biology.Peer ReviewedPostprint (author's final draft
Exploring tradeoffs in pleiotropy and redundancy using evolutionary computing
Evolutionary computation algorithms are increasingly being used to solve
optimization problems as they have many advantages over traditional
optimization algorithms. In this paper we use evolutionary computation to study
the trade-off between pleiotropy and redundancy in a client-server based
network. Pleiotropy is a term used to describe components that perform multiple
tasks, while redundancy refers to multiple components performing one same task.
Pleiotropy reduces cost but lacks robustness, while redundancy increases
network reliability but is more costly, as together, pleiotropy and redundancy
build flexibility and robustness into systems. Therefore it is desirable to
have a network that contains a balance between pleiotropy and redundancy. We
explore how factors such as link failure probability, repair rates, and the
size of the network influence the design choices that we explore using genetic
algorithms.Comment: 10 pages, 6 figure
Computational protein design with backbone plasticity
The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as “scaffolds” onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increase search space but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process
Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases
Copy number variants (CNVs) account for more polymorphic base pairs in the
human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass
genes as well as noncoding DNA, making these polymorphisms good candidates for
functional variation. Consequently, most modern genome-wide association studies
test CNVs along with SNPs, after inferring copy number status from the data
generated by high-throughput genotyping platforms. Here we give an overview of
CNV genomics in humans, highlighting patterns that inform methods for
identifying CNVs. We describe how genotyping signals are used to identify CNVs
and provide an overview of existing statistical models and methods used to
infer location and carrier status from such data, especially the most commonly
used methods exploring hybridization intensity. We compare the power of such
methods with the alternative method of using tag SNPs to identify CNV carriers.
As such methods are only powerful when applied to common CNVs, we describe two
alternative approaches that can be informative for identifying rare CNVs
contributing to disease risk. We focus particularly on methods identifying de
novo CNVs and show that such methods can be more powerful than case-control
designs. Finally we present some recommendations for identifying CNVs
contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …