Location of Repository



Research summary My research deals with machine learning methods, in particular multi-task learning and regularizationbased methods, and their application to biological problems. 1 Pairwise learning for interaction prediction In vaccine design, immunologists are interested in having accurate predictions of which peptides bind to MHC molecules. This is crucial to discover which peptides of a pathogen can trigger an immunological response and therefore give protection against the given pathogen. Different MHC alleles bind different peptides. In drug discovery, biologists try to find small molecules which interact with given therapeutical targets such as enzymes or GPCRs. The goal is to use these molecules as drugs to regulate the target whose abnormal behavior causes a disease. In both cases, traditional prediction methods build one classifier for each target (MHC molecule or drug target) separately. Using a kernel-based approach which casts the problem as predicting whether each pair, e.g. (peptide,MHC) or (molecule,target) interacts or not [1], we obtained significant prediction improvement in accuracy for the targets with few known binders. We have proposed some specific kernels for each problem, and shown that this approach improves the prediction accuracy for both the MHC [2] and drug discovery problems [3]. In [4], we propose some additional kernels for the GPCR case. 2 Clustered multi-task learning Multi-task learning involves considering several related problems simultaneously, with the hope of improving performance by sharing information across these problems or “tasks”. A common strategy is to penalize the variance across the classification functions of all the tasks, which can help guide learning when little data is available. In more realistic settings, it may be that certain inference problems are related but others are quite different. In such cases, penalizing the overall variance may harm the performance and one would like to penalize the variance only within clusters of related problems. As these clusters are unknown a priori, we have proposed in [5] a criterion which penalizes the variance of functions within clusters, and optimize with respect to both the classification functions and clustering. Clustering being a non-convex problem, we have proposed a convex relaxation, which we show improved the prediction performances. 1 3 Structure

Year: 2014
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://cbio.ensmp.fr/~ljacob/d... (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.