65,268 research outputs found
On the design of an ECOC-compliant genetic algorithm
Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
SCOPE: Scalable Composite Optimization for Learning on Spark
Many machine learning models, such as logistic regression~(LR) and support
vector machine~(SVM), can be formulated as composite optimization problems.
Recently, many distributed stochastic optimization~(DSO) methods have been
proposed to solve the large-scale composite optimization problems, which have
shown better performance than traditional batch methods. However, most of these
DSO methods are not scalable enough. In this paper, we propose a novel DSO
method, called \underline{s}calable \underline{c}omposite
\underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it
on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both
computation-efficient and communication-efficient. Theoretical analysis shows
that SCOPE is convergent with linear convergence rate when the objective
function is convex. Furthermore, empirical results on real datasets show that
SCOPE can outperform other state-of-the-art distributed learning methods on
Spark, including both batch learning methods and DSO methods
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
- …