343,175 research outputs found

    Learning Active Learning from Data

    Get PDF
    In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains

    Using Column Generation to Solve Extensions to the Markowitz Model

    Full text link
    We introduce a solution scheme for portfolio optimization problems with cardinality constraints. Typical portfolio optimization problems are extensions of the classical Markowitz mean-variance portfolio optimization model. We solve such type of problems using a method similar to column generation. In this scheme, the original problem is restricted to a subset of the assets resulting in a master convex quadratic problem. Then the dual information of the master problem is used in a sub-problem to propose more assets to consider. We also consider other extensions to the Markowitz model to diversify the portfolio selection within the given intervals for active weights.Comment: 16 pages, 3 figures, 2 tables, 1 pseudocod

    Non-Negative Sparse Regression and Column Subset Selection with L1 Error

    Get PDF
    We consider the problems of sparse regression and column subset selection under L1 error. For both problems, we show that in the non-negative setting it is possible to obtain tight and efficient approximations, without any additional structural assumptions (such as restricted isometry, incoherence, expansion, etc.). For sparse regression, given a matrix A and a vector b with non-negative entries, we give an efficient algorithm to output a vector x of sparsity O(k), for which |Ax - b|_1 is comparable to the smallest error possible using non-negative k-sparse x. We then use this technique to obtain our main result: an efficient algorithm for column subset selection under L1 error for non-negative matrices

    Parameterized Inapproximability of Target Set Selection and Generalizations

    Full text link
    In this paper, we consider the Target Set Selection problem: given a graph and a threshold value thr(v)thr(v) for any vertex vv of the graph, find a minimum size vertex-subset to "activate" s.t. all the vertices of the graph are activated at the end of the propagation process. A vertex vv is activated during the propagation process if at least thr(v)thr(v) of its neighbors are activated. This problem models several practical issues like faults in distributed networks or word-to-mouth recommendations in social networks. We show that for any functions ff and ρ\rho this problem cannot be approximated within a factor of ρ(k)\rho(k) in f(k)⋅nO(1)f(k) \cdot n^{O(1)} time, unless FPT = W[P], even for restricted thresholds (namely constant and majority thresholds). We also study the cardinality constraint maximization and minimization versions of the problem for which we prove similar hardness results

    Feature selection for splice site prediction: A new method using EDA-based feature ranking

    Get PDF
    BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features

    Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection

    Get PDF
    In personalized medicine, it is often desired to determine if all patients or only a subset of them benefit from a treatment. We consider estimation in two‐stage adaptive designs that in stage 1 recruit patients from the full population. In stage 2, patient recruitment is restricted to the part of the population, which, based on stage 1 data, benefits from the experimental treatment. Existing estimators, which adjust for using stage 1 data for selecting the part of the population from which stage 2 patients are recruited, as well as for the confirmatory analysis after stage 2, do not consider time to event patient outcomes. In this work, for time to event data, we have derived a new asymptotically unbiased estimator for the log hazard ratio and a new interval estimator with good coverage probabilities and probabilities that the upper bounds are below the true values. The estimators are appropriate for several selection rules that are based on a single or multiple biomarkers, which can be categorical or continuous
    • 

    corecore