Search CORE

2,081 research outputs found

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

A stopping criterion for multi-objective optimization evolutionary algorithms

Author: Berlanga de Jesús Antonio
García Herrero Jesús
Martí Luis
Molina López José Manuel
Publication venue: 'Elsevier BV'
Publication date: 01/11/2016
Field of study

This Paper Puts Forward A Comprehensive Study Of The Design Of Global Stopping Criteria For Multi-Objective Optimization. In This Study We Propose A Global Stopping Criterion, Which Is Terms As Mgbm After The Authors Surnames. Mgbm Combines A Novel Progress Indicator, Called Mutual Domination Rate (Mdr) Indicator, With A Simplified Kalman Filter, Which Is Used For Evidence-Gathering Purposes. The Mdr Indicator, Which Is Also Introduced, Is A Special-Purpose Progress Indicator Designed For The Purpose Of Stopping A Multi-Objective Optimization. As Part Of The Paper We Describe The Criterion From A Theoretical Perspective And Examine Its Performance On A Number Of Test Problems. We Also Compare This Method With Similar Approaches To The Issue. The Results Of These Experiments Suggest That Mgbm Is A Valid And Accurate Approach. (C) 2016 Elsevier Inc. All Rights Reserved.This work was funded in part by CNPq BJT Project 407851/2012-7 and CNPq PVE Project 314017/2013-

Universidad Carlos III de Madrid e-Archivo

An Iterative Optimization Method Using Genetic Algorithms and Gaussian Process Based Regression in Nuclear Reactor Design Applications

Author: Kumar Akansha
Publication venue
Publication date: 02/03/2017
Field of study

The optimization of a complex system involves the determination of optimum values for a set of design parameters. The optimization search happens in order to meet a specific set of objectives concerning the quantities of interest (QOI). Also, the design parameters are a subset of the input parameters and the QOIs are determined from the output parameters. Particularly, when the parameter space is large, optimization necessitates a significant number of executions of the simulator to obtain a desired solution in tolerance limits. When the simulations are expensive in terms of computation time, an emulator based on regression methods is useful for predictions. This work presents a novel methodology that uses an iterative hybrid global optimization method (GOM) using genetic algorithms (GA) and simulated annealing (SA) model coupled (HYBGASA) with a Gaussian process regression method based emulator (GPMEM) to optimize a set of input parameters based on a set of defined objectives in a nuclear reactor power system. Hereafter this iterative hybrid method comprising of HYBGASA and GPMEM would be called as the “IHGOM". In addition to optimization, IHGOM iteratively updates the trial data obtained from the neighborhood of the near optimal solution, used to train the GPMEM in order to reduce regression errors. The objective is to develop, model and analyze IHGOM, and apply it to an optimization problem in the design of a nuclear reactor. Development and analysis of IHGOM and its implementation in a nuclear reactor power system problem is a significant contribution to the optimization and the nuclear engineering communities

Texas A&M Repository

Practical Block-wise Neural Network Architecture Generation

Author: Liu Cheng-Lin
Shao Jing
Wu Wei
Yan Junjie
Zhong Zhao
Publication venue
Publication date: 14/05/2018
Field of study

Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

Author: Kurbatsky Victor
Leahy Paul
Sidorov Denis
Spiryaev Vadim
Tomin Nikita
Zhukov Alexei
Publication venue
Publication date: 01/04/2014
Field of study

A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting

arXiv.org e-Print Archive

CiteSeerX

Irish Universities

Directory of Open Access Journals

Cork Open Research Archive

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch

Author: De Clercq Orphée
Hoste Veronique
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2016
Field of study

Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information

Crossref

Ghent University Academic Bibliography

Evaluation of scalarization methods and NSGA-II/SPEA2 genetic algorithms for multi-objective optimization of green supply chain design

Author: Dekker R. (Rommert)
Plas C. (Corne) van der
Tervonen T. (Tommi)
Publication venue: Plas, C. (Corne) van der
Publication date: 01/01/2012
Field of study

This paper considers supply chain design in green logistics. We formulate the choice of an environmentally conscious chain design as a multi-objective optimization (MOO) problem and approximate the Pareto front using the weighted sum and epsilon constraint scalarization methods as well as with two popular genetic algorithms, NSGA-II and SPEA2. We extend an existing case study of green supply chain design in the South Eastern Europe region by optimizing simultaneously costs, CO2 and fine dust (also known as PM - Particulate Matters) emissions. The results show that in the considered case the scalarization methods outperform genetic algorithms in finding efficient solutions and that the CO2 and PM emissions can be lowered by accepting a marginal increase of costs over their global minimum

EUR Research Repository

Erasmus University Digital Repository