4,103 research outputs found
Epitope prediction improved by multitask support vector machines
Motivation: In silico methods for the prediction of antigenic peptides
binding to MHC class I molecules play an increasingly important role in the
identification of T-cell epitopes. Statistical and machine learning methods, in
particular, are widely used to score candidate epitopes based on their
similarity with known epitopes and non epitopes. The genes coding for the MHC
molecules, however, are highly polymorphic, and statistical methods have
difficulties to build models for alleles with few known epitopes. In this case,
recent works have demonstrated the utility of leveraging information across
alleles to improve the performance of the prediction. Results: We design a
support vector machine algorithm that is able to learn epitope models for all
alleles simultaneously, by sharing information across similar alleles. The
sharing of information across alleles is controlled by a user-defined measure
of similarity between alleles. We show that this similarity can be defined in
terms of supertypes, or more directly by comparing key residues known to play a
role in the peptide-MHC binding. We illustrate the potential of this approach
on various benchmark experiments where it outperforms other state-of-the-art
methods
Kernel methods for in silico chemogenomics
Predicting interactions between small molecules and proteins is a crucial
ingredient of the drug discovery process. In particular, accurate predictive
models are increasingly used to preselect potential lead compounds from large
molecule databases, or to screen for side-effects. While classical in silico
approaches focus on predicting interactions with a given specific target, new
chemogenomics approaches adopt cross-target views. Building on recent
developments in the use of kernel methods in bio- and chemoinformatics, we
present a systematic framework to screen the chemical space of small molecules
for interaction with the biological space of proteins. We show that this
framework allows information sharing across the targets, resulting in a
dramatic improvement of ligand prediction accuracy for three important classes
of drug targets: enzymes, GPCR and ion channels
Clustered Multi-Task Learning: A Convex Formulation
In multi-task learning several related tasks are considered simultaneously,
with the hope that by an appropriate sharing of information across tasks, each
task may benefit from the others. In the context of learning linear functions
for supervised classification or regression, this can be achieved by including
a priori information about the weight vectors associated with the tasks, and
how they are expected to be related to each other. In this paper, we assume
that tasks are clustered into groups, which are unknown beforehand, and that
tasks within a group have similar weight vectors. We design a new spectral norm
that encodes this a priori assumption, without the prior knowledge of the
partition of tasks into groups, resulting in a new convex optimization
formulation for multi-task learning. We show in simulations on synthetic
examples and on the IEDB MHC-I binding dataset, that our approach outperforms
well-known convex methods for multi-task learning, as well as related non
convex methods dedicated to the same problem
Increasing stability and interpretability of gene expression signatures
Motivation : Molecular signatures for diagnosis or prognosis estimated from
large-scale gene expression data often lack robustness and stability, rendering
their biological interpretation challenging. Increasing the signature's
interpretability and stability across perturbations of a given dataset and, if
possible, across datasets, is urgently needed to ease the discovery of
important biological processes and, eventually, new drug targets. Results : We
propose a new method to construct signatures with increased stability and
easier interpretability. The method uses a gene network as side interpretation
and enforces a large connectivity among the genes in the signature, leading to
signatures typically made of genes clustered in a few subnetworks. It combines
the recently proposed graph Lasso procedure with a stability selection
procedure. We evaluate its relevance for the estimation of a prognostic
signature in breast cancer, and highlight in particular the increase in
interpretability and stability of the signature
Wigner function negativity and contextuality in quantum computation on rebits
We describe a universal scheme of quantum computation by state injection on
rebits (states with real density matrices). For this scheme, we establish
contextuality and Wigner function negativity as computational resources,
extending results of [M. Howard et al., Nature 510, 351--355 (2014)] to
two-level systems. For this purpose, we define a Wigner function suited to
systems of rebits, and prove a corresponding discrete Hudson's theorem. We
introduce contextuality witnesses for rebit states, and discuss the
compatibility of our result with state-independent contextuality.Comment: 18 + 4 page
Simulation of fermionic lattice models in two dimensions with Projected Entangled-Pair States: Next-nearest neighbor Hamiltonians
In a recent contribution [Phys. Rev. B 81, 165104 (2010)] fermionic Projected
Entangled-Pair States (PEPS) were used to approximate the ground state of free
and interacting spinless fermion models, as well as the - model. This
paper revisits these three models in the presence of an additional next-nearest
hopping amplitude in the Hamiltonian. First we explain how to account for
next-nearest neighbor Hamiltonian terms in the context of fermionic PEPS
algorithms based on simulating time evolution. Then we present benchmark
calculations for the three models of fermions, and compare our results against
analytical, mean-field, and variational Monte Carlo results, respectively.
Consistent with previous computations restricted to nearest-neighbor
Hamiltonians, we systematically obtain more accurate (or better converged)
results for gapped phases than for gapless ones.Comment: 10 pages, 11 figures, minor change
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
We present a general approach for collaborative filtering (CF) using spectral
regularization to learn linear operators from "users" to the "objects" they
rate. Recent low-rank type matrix completion approaches to CF are shown to be
special cases. However, unlike existing regularization based CF methods, our
approach can be used to also incorporate information such as attributes of the
users or the objects -- a limitation of existing regularization based CF
methods. We then provide novel representer theorems that we use to develop new
estimation methods. We provide learning algorithms based on low-rank
decompositions, and test them on a standard CF dataset. The experiments
indicate the advantages of generalizing the existing regularization based CF
methods to incorporate related information about users and objects. Finally, we
show that certain multi-task learning methods can be also seen as special cases
of our proposed approach
Machine Learning for In Silico Virtual Screening and Chemical Genomics: New Strategies
Support vector machines and kernel methods belong to the same class of machine learning algorithms that has recently become prominent in both computational biology and chemistry, although both fields have largely ignored each other. These methods are based on a sound mathematical and computationally efficient framework that implicitly embeds the data of interest, respectively proteins and small molecules, in high-dimensional feature spaces where various classification or regression tasks can be performed with linear algorithms. In this review, we present the main ideas underlying these approaches, survey how both the “biological” and the “chemical” spaces have been separately constructed using the same mathematical framework and tricks, and suggest different avenues to unify both spaces for the purpose of in silico chemogenomics
- …