Search CORE

248,963 research outputs found

Protein Function Prediction by Integrating Multiple Kernels ∗

Author: Carlotta Domeniconi
Guoji Zhang
Guoxian Yu
Huzefa Rangwala
Zili Zhang
Publication venue
Publication date: 01/01/2013
Field of study

Determining protein function constitutes an exercise in integrating information derived from several heterogeneous high-throughput experiments. To utilize the information spread across multiple sources in a combined fashion, these data sources are transformed into kernels. Several protein function prediction methods follow a two-phased approach: they first optimize the weights on individual kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these methods result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some methods optimize the loss of binary classifiers, and learn weights for the different kernels iteratively. A protein has multiple functions, and each function can be viewed as a label. These methods solve the problem of optimizing weights on the input kernels for each of the labels. This is computationally expensive and ignores inter-label correlations. In this paper, we propose a method called Protein Function Prediction by Integrating Multiple Kernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reducing the empirical loss of a multi-label classifier for each of the labels simultaneously, using a combined objective function. ProMK can assign larger weights to smooth kernels and downgrade the weights on noisy kernels. We evaluate the ability of ProMK to predict the function of proteins using several standard benchmarks. We show that our approach performs better than previously proposed protein function prediction approaches that integrate data from multiple networks, and multi-label multiple kernel learning methods.

CiteSeerX

LABEL-BASED MULTIPLE KERNEL LEARNING FOR CLASSIFICATION

Author: Bing Yang
Changhe Fu
Ling Jing
Lujia Song
Qian Li
Publication venue
Publication date: 06/03/2020
Field of study

Abstract This paper provides a novel technique for multiple kernel learning within Support Vector Machine framework. The problem of combining different sources of information arises in several situations, for instance, the classification of data with asymmetric similarity matrices or the construction of an optimal classifier from a collection of kernels. Often, each source of information can be expressed as a similarity matrix. In this paper we propose a new method in order to produce a single optimal kernel matrix from a collection of kernel (similarity) matrices with the label information for classification purposes. Then, the constructed kernel matrix is used to train a Support Vector Machine. The key ideas within the kernel construction are twofold: the quantification, relative to the classification labels, of the difference of information among the similarities; and the linear combination of similarity matrices to the concept of functional combination of similarity matrices. The proposed method has been successfully evaluated and compared with other powerful classifiers on a variety of real classification problems

CiteSeerX

Scalarization for Multi-Task and Multi-Domain Learning at Scale

Author: Bejnordi Babak Ehteshami
Blankevoort Tijmen
Royer Amelie
Publication venue
Publication date: 13/10/2023
Field of study

Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone hence improves model efficiency. It also enables potential positive knowledge transfer across tasks/domains, leading to improved accuracy and data-efficient training. However, optimizing such networks is a challenge, in particular due to discrepancies between the different tasks or domains: Despite several hypotheses and solutions proposed over the years, recent work has shown that uniform scalarization training, i.e., simply minimizing the average of the task losses, yields on-par performance with more costly SotA optimization methods. This raises the issue of how well we understand the training dynamics of multi-task and multi-domain networks. In this work, we first devise a large-scale unified analysis of multi-domain and multi-task learning to better understand the dynamics of scalarization across varied task/domain combinations and model sizes. Following these insights, we then propose to leverage population-based training to efficiently search for the optimal scalarization weights when dealing with a large number of tasks or domains.Comment: NeurIPS 2023; https://openreview.net/forum?id=TSuq3debn

arXiv.org e-Print Archive

Active Learning of Multiple Source Multiple Destination Topologies

Author: Animashree An
Athina Markopoulou
Maciej Kurant
Michael Rabbat
Pegah Sattari
Senior Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2013
Field of study

We consider the problem of inferring the topology of a network with

M

sources and

N

receivers (hereafter referred to as an

M

-by-

N

network), by sending probes between the sources and receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e.,

1

-by-

N

's or

2

-by-

2

's) and then merge these components to identify the

M

-by-

N

topology. In this paper, we focus on the second part, which had previously received less attention in the literature. In particular, we assume that a

1

-by-

N

topology is given and that all

2

-by-

2

components can be queried and learned using end-to-end probes. The problem is which

2

-by-

2

's to query and how to merge them with the given

1

-by-

N

, so as to exactly identify the

2

-by-

N

topology, and optimize a number of performance metrics, including the number of queries (which directly translates into measurement bandwidth), time complexity, and memory usage. We provide a lower bound,

\lceil \frac{N}{2} \rceil

, on the number of

2

-by-

2

's required by any active learning algorithm and propose two greedy algorithms. The first algorithm follows the framework of multiple hypothesis testing, in particular Generalized Binary Search (GBS), since our problem is one of active learning, from

2

-by-

2

queries. The second algorithm is called the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding