1,785 research outputs found
Massively-Parallel Feature Selection for Big Data
We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for
feature selection (FS) in Big Data settings (high dimensionality and/or sample
size). To tackle the challenges of Big Data FS PFBP partitions the data matrix
both in terms of rows (samples, training examples) as well as columns
(features). By employing the concepts of -values of conditional independence
tests and meta-analysis techniques PFBP manages to rely only on computations
local to a partition while minimizing communication costs. Then, it employs
powerful and safe (asymptotically sound) heuristics to make early, approximate
decisions, such as Early Dropping of features from consideration in subsequent
iterations, Early Stopping of consideration of features within the same
iteration, or Early Return of the winner in each iteration. PFBP provides
asymptotic guarantees of optimality for data distributions faithfully
representable by a causal network (Bayesian network or maximal ancestral
graph). Our empirical analysis confirms a super-linear speedup of the algorithm
with increasing sample size, linear scalability with respect to the number of
features and processing cores, while dominating other competitive algorithms in
its class
Scalable large margin pairwise learning algorithms
2019 Summer.Includes bibliographical references.Classification is a major task in machine learning and data mining applications. Many of these applications involve building a classification model using a large volume of imbalanced data. In such an imbalanced learning scenario, the area under the ROC curve (AUC) has proven to be a reliable performance measure to evaluate a classifier. Therefore, it is desirable to develop scalable learning algorithms that maximize the AUC metric directly. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines. However, the computational cost of the kernelized machines hinders their scalability. To address this problem, we propose a large-scale nonlinear AUC maximization algorithm that learns a batch linear classifier on approximate feature space computed via the k-means Nyström method. The proposed algorithm is shown empirically to achieve comparable AUC classification performance or even better than the kernel AUC machines, while its training time is faster by several orders of magnitude. However, the computational complexity of the linear batch model compromises its scalability when training sizable datasets. Hence, we develop a second-order online AUC maximization algorithms based on a confidence-weighted model. The proposed algorithms exploit the second-order information to improve the convergence rate and implement a fixed-size buffer to address the multivariate nature of the AUC objective function. We also extend our online linear algorithms to consider an approximate feature map constructed using random Fourier features in an online setting. The results show that our proposed algorithms outperform or are at least comparable to the competing online AUC maximization methods. Despite their scalability, we notice that online first and second-order AUC maximization methods are prone to suboptimal convergence. This can be attributed to the limitation of the hypothesis space. A potential improvement can be attained by learning stochastic online variants. However, the vanilla stochastic methods also suffer from slow convergence because of the high variance introduced by the stochastic process. We address the problem of slow convergence by developing a fast convergence stochastic AUC maximization algorithm. The proposed stochastic algorithm is accelerated using a unique combination of scheduled regularization update and scheduled averaging. The experimental results show that the proposed algorithm performs better than the state-of-the-art online and stochastic AUC maximization methods in terms of AUC classification accuracy. Moreover, we develop a proximal variant of our accelerated stochastic AUC maximization algorithm. The proposed method applies the proximal operator to the hinge loss function. Therefore, it evaluates the gradient of the loss function at the approximated weight vector. Experiments on several benchmark datasets show that our proximal algorithm converges to the optimal solution faster than the previous AUC maximization algorithms
Maximum Volume Subset Selection for Anchored Boxes
Let B be a set of n axis-parallel boxes in d-dimensions such that each box has a corner at the origin and the other corner in the positive quadrant, and let k be a positive integer. We study the problem of selecting k boxes in B that maximize the volume of the union of the selected boxes. The research is motivated by applications in skyline queries for databases and in multicriteria optimization, where the problem is known as the hypervolume subset selection problem. It is known that the problem can be solved in polynomial time in the plane, while the best known algorithms in any dimension d>2 enumerate all size-k subsets. We show that:
* The problem is NP-hard already in 3 dimensions.
* In 3 dimensions, we break the enumeration of all size-k subsets, by providing an n^O(sqrt(k)) algorithm.
* For any constant dimension d, we give an efficient polynomial-time approximation scheme
Effective anytime algorithm for multiobjective combinatorial optimization problems
In multiobjective optimization, the result of an optimization algorithm is a set of efficient solutions from which the decision maker selects one. It is common that not all the efficient solutions can be computed in a short time and the search algorithm has to be stopped prematurely to analyze the solutions found so far. A set of efficient solutions that are well-spread in the objective space is preferred to provide the decision maker with a great variety of solutions. However, just a few exact algorithms in the literature exist with the ability to provide such a well-spread set of solutions at any moment: we call them anytime algorithms. We propose a new exact anytime algorithm for multiobjective combinatorial optimization combining three novel ideas to enhance the anytime behavior. We compare the proposed algorithm with those in the state-of-the-art for anytime multiobjective combinatorial optimization using a set of 480 instances from different well-known benchmarks and four different performance measures: the overall non-dominated vector generation ratio, the hypervolume, the general spread and the additive epsilon indicator. A comprehensive experimental study reveals that our proposal outperforms the previous algorithms in most of the instances.This research has been partially funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and the European Regional Development Fund (FEDER) under contract TIN2017-88213-R (6city project), the European Research Council under contract H2020-ICT-2019-3 (TAILOR project), the University of MĂĄlaga, ConsejerĂa de EconomĂa y Conocimiento de la Junta de AndalucĂa and FEDER under contract UMA18-FEDERJA-003 (PRECOG project), the Ministry of Science, Innovation and Universities and FEDER under contract RTC-2017-6714-5, and the University of MĂĄlaga under contract PPIT.UMA.B1.2017/07 (EXHAURO Project)
Evolutionary diversity optimization using multi-objective indicators
Evolutionary diversity optimization aims to compute a set of solutions that are diverse in the search space or instance feature space, and where all solutions meet a given quality criterion. With this paper, we bridge the areas of evolutionary diversity optimization and evolutionary multi-objective optimization. We show how popular indicators frequently used in the area of multi-objective optimization can be used for evolutionary diversity optimization. Our experimental investigations for evolving diverse sets of TSP instances and images according to various features show that two of the most prominent multi-objective indicators, namely the hypervolume indicator and the inverted generational distance, provide excellent results in terms of visualization and various diversity indicators.Aneta Neumann, Wanru Gao, Markus Wagner, Frank Neuman
Maximum Volume Subset Selection for Anchored Boxes
Let be a set of axis-parallel boxes in such that each box has a corner at the origin and the other corner in the positive quadrant of , and let be a positive integer. We study the problem of selecting boxes in that maximize the volume of the union of the selected boxes. This research is motivated by applications in skyline queries for databases and in multicriteria optimization, where the problem is known as the hypervolume subset selection problem. It is known that the problem can be solved in polynomial time in the plane, while the best known running time in any dimension is . We show that: - The problem is NP-hard already in 3 dimensions. - In 3 dimensions, we break the bound , by providing an algorithm. - For any constant dimension , we present an efficient polynomial-time approximation scheme
- âŠ