Search CORE

343,175 research outputs found

Learning Active Learning from Data

Author: Fua Pascal
Konyushkova Ksenia
Sznitman Raphael
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains

arXiv.org e-Print Archive

Bern Open Repository and Information System (BORIS)

Using Column Generation to Solve Extensions to the Markowitz Model

Author: Roebers Lorenz M.
Selvi Aras
Vera Juan C.
Publication venue
Publication date: 21/06/2019
Field of study

We introduce a solution scheme for portfolio optimization problems with cardinality constraints. Typical portfolio optimization problems are extensions of the classical Markowitz mean-variance portfolio optimization model. We solve such type of problems using a method similar to column generation. In this scheme, the original problem is restricted to a subset of the assets resulting in a master convex quadratic problem. Then the dual information of the master problem is used in a sub-problem to propose more assets to consider. We also consider other extensions to the Markowitz model to diversify the portfolio selection within the given intervals for active weights.Comment: 16 pages, 3 figures, 2 tables, 1 pseudocod

arXiv.org e-Print Archive

Tilburg University Repository

Non-Negative Sparse Regression and Column Subset Selection with L1 Error

Author: Bhaskara Aditya
Lattanzi Silvio
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)
Publication date: 01/01/2018
Field of study

We consider the problems of sparse regression and column subset selection under L1 error. For both problems, we show that in the non-negative setting it is possible to obtain tight and efficient approximations, without any additional structural assumptions (such as restricted isometry, incoherence, expansion, etc.). For sparse regression, given a matrix A and a vector b with non-negative entries, we give an efficient algorithm to output a vector x of sparsity O(k), for which |Ax - b|_1 is comparable to the smallest error possible using non-negative k-sparse x. We then use this technique to obtain our main result: an efficient algorithm for column subset selection under L1 error for non-negative matrices

Dagstuhl Research Online Publication Server

Parameterized Inapproximability of Target Set Selection and Generalizations

Author: A. Aazami
C. Bazgan
C.-L. Chang
D. Marx
D. Marx
D. Peleg
F. Cicalese
L. Cai
M. Chopin
N. Chen
O. Ben-Zwi
P.A. Dreyer
R.G. Downey
T.V.T. Reddy
Y. Chen
Publication venue: 'IOS Press'
Publication date: 01/01/2014
Field of study

In this paper, we consider the Target Set Selection problem: given a graph and a threshold value

thr(v)

for any vertex

v

of the graph, find a minimum size vertex-subset to "activate" s.t. all the vertices of the graph are activated at the end of the propagation process. A vertex

v

is activated during the propagation process if at least

thr(v)

of its neighbors are activated. This problem models several practical issues like faults in distributed networks or word-to-mouth recommendations in social networks. We show that for any functions

f

and

\rho

this problem cannot be approximated within a factor of

\rho(k)

f(k) \cdot n^{O(1)}

time, unless FPT = W[P], even for restricted thresholds (namely constant and majority thresholds). We also study the cardinality constraint maximization and minimization versions of the problem for which we prove similar hardness results

arXiv.org e-Print Archive

CiteSeerX

Feature selection for splice site prediction: A new method using EDA-based feature ranking

Author: Aeyels Dirk
Degroeve Sven
Rouzé Pierre
Saeys Yvan
Van de Peer Yves
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features

Springer - Publisher Connector

Directory of Open Access Journals

Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection

Author: Glimm Ekkehard
Kairalla John A.
Khan Josephine N.
Kimani Peter K.
Renfro Lindsay A.
Stallard Nigel
Todd Susan
Publication venue: 'Wiley'
Publication date: 30/08/2020
Field of study

In personalized medicine, it is often desired to determine if all patients or only a subset of them benefit from a treatment. We consider estimation in two‐stage adaptive designs that in stage 1 recruit patients from the full population. In stage 2, patient recruitment is restricted to the part of the population, which, based on stage 1 data, benefits from the experimental treatment. Existing estimators, which adjust for using stage 1 data for selecting the part of the population from which stage 2 patients are recruited, as well as for the confirmatory analysis after stage 2, do not consider time to event patient outcomes. In this work, for time to event data, we have derived a new asymptotically unbiased estimator for the log hazard ratio and a new interval estimator with good coverage probabilities and probabilities that the upper bounds are below the true values. The estimators are appropriate for several selection rules that are based on a single or multiple biomarkers, which can be categorical or continuous

Warwick Research Archives Portal Repository