Search CORE

65,431 research outputs found

Detecting Variability in Massive Astronomical Time-Series Data I: application of an infinite Gaussian mixture model

Author: Akerlof
Alcock
Alcock
Alcock
Antoniak
Bamford
Belokurov
Belokurov
Bingham
Bishop
Blei
Bono
Carbonell
Chattopadhyay
Chen
Cook
Dahlmark
Debosscher
Dy
Eyer
Eyer
Eyer
Eyer
Eyer
Ferguson
Genova
Guglielmo
Helou
Jain
Kaiser
Kelly
Koen
Krzanowski
Mahabal
Michael Sekora
Min-Su Shin
MÃ¼ller
Nassau
Neal
PaczyÅski
PaczyÅski
Panik
Schwarzenberg-Czerny
Schwarzenberg-Czerny
Shin
Shin
Stetson
Strohmeier
Sumi
Templeton
Ververidis
Von Neumann
Walker
Walker
Williams
Wozniak
WoÅºniak
Yong-Ik Byun
Publication venue: 'Wiley'
Publication date: 18/08/2009
Field of study

We present a new framework to detect various types of variable objects within massive astronomical time-series data. Assuming that the dominant population of objects is non-variable, we find outliers from this population by using a non-parametric Bayesian clustering algorithm based on an infinite GaussianMixtureModel (GMM) and the Dirichlet Process. The algorithm extracts information from a given dataset, which is described by six variability indices. The GMM uses those variability indices to recover clusters that are described by six-dimensional multivariate Gaussian distributions, allowing our approach to consider the sampling pattern of time-series data, systematic biases, the number of data points for each light curve, and photometric quality. Using the Northern Sky Variability Survey data, we test our approach and prove that the infinite GMM is useful at detecting variable objects, while providing statistical inference estimation that suppresses false detection. The proposed approach will be effective in the exploration of future surveys such as GAIA, Pan-Starrs, and LSST, which will produce massive time-series data.Comment: accepted for publication in MNRA

arXiv.org e-Print Archive

Crossref

Hierarchical Subquery Evaluation for Active Learning on a Graph

Author: Brostow Gabriel J.
Campbell Neill D. F.
Kautz Jan
Mac Aodha Oisin
Publication venue
Publication date: 25/09/2014
Field of study

To train good supervised and semi-supervised object classifiers, it is critical that we not waste the time of the human experts who are providing the training labels. Existing active learning strategies can have uneven performance, being efficient on some datasets but wasteful on others, or inconsistent just between runs on the same dataset. We propose perplexity based graph construction and a new hierarchical subquery evaluation algorithm to combat this variability, and to release the potential of Expected Error Reduction. Under some specific circumstances, Expected Error Reduction has been one of the strongest-performing informativeness criteria for active learning. Until now, it has also been prohibitively costly to compute for sizeable datasets. We demonstrate our highly practical algorithm, comparing it to other active learning measures on classification datasets that vary in sparsity, dimensionality, and size. Our algorithm is consistent over multiple runs and achieves high accuracy, while querying the human expert for labels at a frequency that matches their desired time budget.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

OPUS

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Author: Ahmed Sajid
Farid Dewan Md.
Jani Md. Rafsan
Mahbub Asif
Rayhan Farshid
Shatabda Swakkhar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2017
Field of study

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.Comment: CSITSS-201

arXiv.org e-Print Archive

Crossref

Nonparametric Hierarchical Clustering of Functional Data

Author: C. Abraham
D.M. Blei
F. Chamroukhi
G. Delaigle
G. Hébrail
J. Rissanen
M. Abramowitz
P. Hansen
R.M. Neal
T. Cover
T. Gasser
X. Nguyen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set

arXiv.org e-Print Archive

Crossref

HAL-Paris1

Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection

Author: Dayoub Feras
Milford Michael
Miller Dimity
Sünderhauf Niko
Publication venue
Publication date: 01/01/2019
Field of study

There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies. We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combination can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance. We base our evaluation on a new mix of datasets that emulate near open-set conditions (semantically similar unknown classes), distant open-set conditions (semantically dissimilar unknown classes) and the common closed-set conditions (only known classes).Comment: to appear in IEEE International Conference on Robotics and Automation 2019 (ICRA 2019

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Queensland University of Technology ePrints Archive