31,447 research outputs found
An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data
Feature selection has been studied widely in the literature. However, the
efficacy of the selection criteria for low sample size applications is
neglected in most cases. Most of the existing feature selection criteria are
based on the sample similarity. However, the distance measures become
insignificant for high dimensional low sample size (HDLSS) data. Moreover, the
variance of a feature with a few samples is pointless unless it represents the
data distribution efficiently. Instead of looking at the samples in groups, we
evaluate their efficiency based on pairwise fashion. In our investigation, we
noticed that considering a pair of samples at a time and selecting the features
that bring them closer or put them far away is a better choice for feature
selection. Experimental results on benchmark data sets demonstrate the
effectiveness of the proposed method with low sample size, which outperforms
many other state-of-the-art feature selection methods.Comment: European Signal Processing Conference 201
Automated Classification of Periodic Variable Stars detected by the Wide-field Infrared Survey Explorer
We describe a methodology to classify periodic variable stars identified
using photometric time-series measurements constructed from the Wide-field
Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases.
This will assist in the future construction of a WISE Variable Source Database
that assigns variables to specific science classes as constrained by the WISE
observing cadence with statistically meaningful classification probabilities.
We have analyzed the WISE light curves of 8273 variable stars identified in
previous optical variability surveys (MACHO, GCVS, and ASAS) and show that
Fourier decomposition techniques can be extended into the mid-IR to assist with
their classification. Combined with other periodic light-curve features, this
sample is then used to train a machine-learned classifier based on the random
forest (RF) method. Consistent with previous classification studies of variable
stars in general, the RF machine-learned classifier is superior to other
methods in terms of accuracy, robustness against outliers, and relative
immunity to features that carry little or redundant class information. For the
three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae
Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%,
and 84.5% respectively using cross-validation analyses, with 95% confidence
intervals of approximately +/-2%. These accuracies are achieved at purity (or
reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that
achieved in previous automated classification studies of periodic variable
stars.Comment: 48 pages, 17 figures, 1 table, accepted by A
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations
Post-hoc explanations of machine learning models are crucial for people to
understand and act on algorithmic predictions. An intriguing class of
explanations is through counterfactuals, hypothetical examples that show people
how to obtain a different prediction. We posit that effective counterfactual
explanations should satisfy two properties: feasibility of the counterfactual
actions given user context and constraints, and diversity among the
counterfactuals presented. To this end, we propose a framework for generating
and evaluating a diverse set of counterfactual explanations based on
determinantal point processes. To evaluate the actionability of
counterfactuals, we provide metrics that enable comparison of
counterfactual-based methods to other local explanation methods. We further
address necessary tradeoffs and point to causal implications in optimizing for
counterfactuals. Our experiments on four real-world datasets show that our
framework can generate a set of counterfactuals that are diverse and well
approximate local decision boundaries, outperforming prior approaches to
generating diverse counterfactuals. We provide an implementation of the
framework at https://github.com/microsoft/DiCE.Comment: 13 page
LATTE: Application Oriented Social Network Embedding
In recent years, many research works propose to embed the network structured
data into a low-dimensional feature space, where each node is represented as a
feature vector. However, due to the detachment of embedding process with
external tasks, the learned embedding results by most existing embedding models
can be ineffective for application tasks with specific objectives, e.g.,
community detection or information diffusion. In this paper, we propose study
the application oriented heterogeneous social network embedding problem.
Significantly different from the existing works, besides the network structure
preservation, the problem should also incorporate the objectives of external
applications in the objective function. To resolve the problem, in this paper,
we propose a novel network embedding framework, namely the "appLicAtion
orienTed neTwork Embedding" (Latte) model. In Latte, the heterogeneous network
structure can be applied to compute the node "diffusive proximity" scores,
which capture both local and global network structures. Based on these computed
scores, Latte learns the network representation feature vectors by extending
the autoencoder model model to the heterogeneous network scenario, which can
also effectively unite the objectives of network embedding and external
application tasks. Extensive experiments have been done on real-world
heterogeneous social network datasets, and the experimental results have
demonstrated the outstanding performance of Latte in learning the
representation vectors for specific application tasks.Comment: 11 Pages, 12 Figures, 1 Tabl
- …