109,845 research outputs found
Exploring instance correlation for advanced active learning
University of Technology, Sydney. Faculty of Engineering and Information Technology.Active learning (AL) aims to construct an accurate classifier with the minimum labeling cost by actively selecting a few number of most informative instances for labeling. AL traditionally relies on some instance-based utility measures to assess individual instances and label the ones with the maximum values for training. However, such approaches cannot produce good labeling subsets. Because instances exist some explicit / implicit relations between each other, instance-based utility measure evaluates instance informativeness independently without considering their interactions. Accordingly, this thesis explores instance correlation in AL and utilizes it to make AL’s more accurate and applicable. To be specific, our objective is to explore instance correlation from different views and utilize them for three different tasks, including (1) reduce redundancy for optimal subset selection, (2) reduce labeling cost with a nonexpert labeler and (3) discover class spaces for dynamic data.
First of all, the thesis introduces existing works on active learning from an instance-correlation perspective. Then it summarizes their technical strengths / weaknesses, followed by runtime and label complexity analysis, discussion about emerging active learning applications and instance-selection challenges therein.
Secondly, we propose three AL paradigms by integrating different instance correlations into three major issues of AL, respectively. 1) The first method is an optimal instance subset selection method (ALOSS), where an expert is employed to provide accurate class labels for the queried data. Due to instance-based utility measures assess individual instances and label the ones with the maximum values, this may result in the redundancy issue in the selected subset. To address this issue, ALOSS simultaneously considers the importance of individual instances and the disparity between instances for subset selection. 2) The second method introduces pairwise label homogeneity in AL setting, in which a non-expert labeler is only asked “whether a pair of instances belong to the same class”. We explore label homogeneity information by using a non-expert labeler, aiming to further reducing the labeling cost of AL. 3) The last active learning method also utilizes pairwise label homogeneity for active class discovery and exploration in dynamic data, where some new classes may rapidly emerge and evolve, thereby making the labeler incapable of labeling the instances due to limited knowledge. Accordingly, we utilize pairwise label homogeneity information to uncover the hidden class spaces and find new classes timely. Empirical studies show that the proposed methods significantly outperform the state-of-the-art AL methods
Decision table for classifying point sources based on FIRST and 2MASS databases
With the availability of multiwavelength, multiscale and multiepoch
astronomical catalogues, the number of features to describe astronomical
objects has increases. The better features we select to classify objects, the
higher the classification accuracy is. In this paper, we have used data sets of
stars and quasars from near infrared band and radio band. Then best-first
search method was applied to select features. For the data with selected
features, the algorithm of decision table was implemented. The classification
accuracy is more than 95.9%. As a result, the feature selection method improves
the effectiveness and efficiency of the classification method. Moreover the
result shows that decision table is robust and effective for discrimination of
celestial objects and used for preselecting quasar candidates for large survey
projects.Comment: 10 pages. accepted by Advances in Space Researc
Computation-Communication Trade-offs and Sensor Selection in Real-time Estimation for Processing Networks
Recent advances in electronics are enabling substantial processing to be
performed at each node (robots, sensors) of a networked system. Local
processing enables data compression and may mitigate measurement noise, but it
is still slower compared to a central computer (it entails a larger
computational delay). However, while nodes can process the data in parallel,
the centralized computational is sequential in nature. On the other hand, if a
node sends raw data to a central computer for processing, it incurs
communication delay. This leads to a fundamental communication-computation
trade-off, where each node has to decide on the optimal amount of preprocessing
in order to maximize the network performance. We consider a network in charge
of estimating the state of a dynamical system and provide three contributions.
First, we provide a rigorous problem formulation for optimal real-time
estimation in processing networks in the presence of delays. Second, we show
that, in the case of a homogeneous network (where all sensors have the same
computation) that monitors a continuous-time scalar linear system, the optimal
amount of local preprocessing maximizing the network estimation performance can
be computed analytically. Third, we consider the realistic case of a
heterogeneous network monitoring a discrete-time multi-variate linear system
and provide algorithms to decide on suitable preprocessing at each node, and to
select a sensor subset when computational constraints make using all sensors
suboptimal. Numerical simulations show that selecting the sensors is crucial.
Moreover, we show that if the nodes apply the preprocessing policy suggested by
our algorithms, they can largely improve the network estimation performance.Comment: 15 pages, 16 figures. Accepted journal versio
- …