141,031 research outputs found
A cluster based hybrid feature selection approach
Data collection and storage capacities have increased significantly in the past decades. In order to cope with the increasingly complexity of data, feature selection methods have become an omnipresent preprocessing step in data analysis. In this paper we present a hybrid (filter — wrapper) feature selection method tailored for data classification problems. Our hybrid approach is composed of two stages. In the first stage, a filter clusters features to identify and remove redundancy. In the second stage, a wrapper evaluates different feature subsets produced by the filter, determining the one that produces the best classification performance in terms of accuracy. The effectiveness of our method is demonstrated through an empirical evaluation performed on real-world datasets coming from various sources.FAPESP (Grant #2011/04247-5 and #2013/18698-4)CNPq (Grant #304137/2013-8
A hybrid supervised/unsupervised machine learning approach to solar flare prediction
We introduce a hybrid approach to solar flare prediction, whereby a
supervised regularization method is used to realize feature importance and an
unsupervised clustering method is used to realize the binary flare/no-flare
decision. The approach is validated against NOAA SWPC data
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
Submodular Load Clustering with Robust Principal Component Analysis
Traditional load analysis is facing challenges with the new electricity usage
patterns due to demand response as well as increasing deployment of distributed
generations, including photovoltaics (PV), electric vehicles (EV), and energy
storage systems (ESS). At the transmission system, despite of irregular load
behaviors at different areas, highly aggregated load shapes still share similar
characteristics. Load clustering is to discover such intrinsic patterns and
provide useful information to other load applications, such as load forecasting
and load modeling. This paper proposes an efficient submodular load clustering
method for transmission-level load areas. Robust principal component analysis
(R-PCA) firstly decomposes the annual load profiles into low-rank components
and sparse components to extract key features. A novel submodular cluster
center selection technique is then applied to determine the optimal cluster
centers through constructed similarity graph. Following the selection results,
load areas are efficiently assigned to different clusters for further load
analysis and applications. Numerical results obtained from PJM load demonstrate
the effectiveness of the proposed approach.Comment: Accepted by 2019 IEEE PES General Meeting, Atlanta, G
StackInsights: Cognitive Learning for Hybrid Cloud Readiness
Hybrid cloud is an integrated cloud computing environment utilizing a mix of
public cloud, private cloud, and on-premise traditional IT infrastructures.
Workload awareness, defined as a detailed full range understanding of each
individual workload, is essential in implementing the hybrid cloud. While it is
critical to perform an accurate analysis to determine which workloads are
appropriate for on-premise deployment versus which workloads can be migrated to
a cloud off-premise, the assessment is mainly performed by rule or policy based
approaches. In this paper, we introduce StackInsights, a novel cognitive system
to automatically analyze and predict the cloud readiness of workloads for an
enterprise. Our system harnesses the critical metrics across the entire stack:
1) infrastructure metrics, 2) data relevance metrics, and 3) application
taxonomy, to identify workloads that have characteristics of a) low sensitivity
with respect to business security, criticality and compliance, and b) low
response time requirements and access patterns. Since the capture of the data
relevance metrics involves an intrusive and in-depth scanning of the content of
storage objects, a machine learning model is applied to perform the business
relevance classification by learning from the meta level metrics harnessed
across stack. In contrast to traditional methods, StackInsights significantly
reduces the total time for hybrid cloud readiness assessment by orders of
magnitude
- …