4,204 research outputs found
Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation
Feature selection (FS) has become an indispensable task in dealing with
today's highly complex pattern recognition problems with massive number of
features. In this study, we propose a new wrapper approach for FS based on
binary simultaneous perturbation stochastic approximation (BSPSA). This
pseudo-gradient descent stochastic algorithm starts with an initial feature
vector and moves toward the optimal feature vector via successive iterations.
In each iteration, the current feature vector's individual components are
perturbed simultaneously by random offsets from a qualified probability
distribution. We present computational experiments on datasets with numbers of
features ranging from a few dozens to thousands using three widely-used
classifiers as wrappers: nearest neighbor, decision tree, and linear support
vector machine. We compare our methodology against the full set of features as
well as a binary genetic algorithm and sequential FS methods using
cross-validated classification error rate and AUC as the performance criteria.
Our results indicate that features selected by BSPSA compare favorably to
alternative methods in general and BSPSA can yield superior feature sets for
datasets with tens of thousands of features by examining an extremely small
fraction of the solution space. We are not aware of any other wrapper FS
methods that are computationally feasible with good convergence properties for
such large datasets.Comment: This is the Istanbul Sehir University Technical Report
#SHR-ISE-2016.01. A short version of this report has been accepted for
publication at Pattern Recognition Letter
Recommended from our members
UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores
Lemon: an MPI parallel I/O library for data encapsulation using LIME
We introduce Lemon, an MPI parallel I/O library that is intended to allow for
efficient parallel I/O of both binary and metadata on massively parallel
architectures. Motivated by the demands of the Lattice Quantum Chromodynamics
community, the data is stored in the SciDAC Lattice QCD Interchange Message
Encapsulation format. This format allows for storing large blocks of binary
data and corresponding metadata in the same file. Even if designed for LQCD
needs, this format might be useful for any application with this type of data
profile. The design, implementation and application of Lemon are described. We
conclude with presenting the excellent scaling properties of Lemon on state of
the art high performance computers
Semantic Mediation of Environmental Observation Datasets through Sensor Observation Services
A large volume of environmental observation data is being generated as a result of the observation
of many properties at the Earth surface. In parallel, there exists a clear interest in accessing data from different data providers related to the same property, in order to solve concrete problems. Based on such fact, there is also an increasing interest in publishing the above data through open interfaces in the scope of Spatial Data Infraestructures. There have been important advances in the definition of open standards of the Open Geospatial Consortium (OGC) that enable interoperable access to sensor data. Among the proposed interfaces, the Sensor Observation Service (SOS) is having an important impact. We have realized that currently there is no available solution to provide integrated access to various data sources through a SOS interface. This problem shows up two main facets. On the one hand, the heterogeneity among different data sources has to be solved. On the other hand, semantic conflicts that arise during the integration process must also resolved with the help of relevant domain expert knowledge. To solve the problems, the main goal of this thesis is to design and develop a semantic data mediation framework to access any kind of environmental observation dataset, including both relational data sources and multidimensional arrays
- …