102 research outputs found
DeepTopPush: Simple and Scalable Method for Accuracy at the Top
Accuracy at the top is a special class of binary classification problems
where the performance is evaluated only on a small number of relevant (top)
samples. Applications include information retrieval systems or processes with
manual (expensive) postprocessing. This leads to the minimization of irrelevant
samples above a threshold. We consider classifiers in the form of an arbitrary
(deep) network and propose a new method DeepTopPush for minimizing the top loss
function. Since the threshold depends on all samples, the problem is
non-decomposable. We modify the stochastic gradient descent to handle the
non-decomposability in an end-to-end training manner and propose a way to
estimate the threshold only from values on the current minibatch. We
demonstrate the good performance of DeepTopPush on visual recognition datasets
and on a real-world application of selecting a small number of molecules for
further drug testing
Top Rank Optimization in Linear Time
Bipartite ranking aims to learn a real-valued ranking function that orders
positive instances before negative instances. Recent efforts of bipartite
ranking are focused on optimizing ranking accuracy at the top of the ranked
list. Most existing approaches are either to optimize task specific metrics or
to extend the ranking loss by emphasizing more on the error associated with the
top ranked instances, leading to a high computational cost that is super-linear
in the number of training instances. We propose a highly efficient approach,
titled TopPush, for optimizing accuracy at the top that has computational
complexity linear in the number of training instances. We present a novel
analysis that bounds the generalization error for the top ranked instances for
the proposed approach. Empirical study shows that the proposed approach is
highly competitive to the state-of-the-art approaches and is 10-100 times
faster
Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference
Mutation analysis can effectively capture the dependency between source code
and test results. This has been exploited by Mutation Based Fault Localisation
(MBFL) techniques. However, MBFL techniques suffer from the need to expend the
high cost of mutation analysis after the observation of failures, which may
present a challenge for its practical adoption. We introduce SIMFL (Statistical
Inference for Mutation-based Fault Localisation), an MBFL technique that allows
users to perform the mutation analysis in advance against an earlier version of
the system. SIMFL uses mutants as artificial faults and aims to learn the
failure patterns among test cases against different locations of mutations.
Once a failure is observed, SIMFL requires either almost no or very small
additional cost for analysis, depending on the used inference model. An
empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL
can successfully localise up to 103 faults at the top, and 152 faults within
the top five, on par with state-of-the-art alternatives. The cost of mutation
analysis can be further reduced by mutation sampling: SIMFL retains over 80% of
its localisation accuracy at the top rank when using only 10% of generated
mutants, compared to results obtained without sampling
Nonlinear classifiers for ranking problems based on kernelized SVM
Many classification problems focus on maximizing the performance only on the
samples with the highest relevance instead of all samples. As an example, we
can mention ranking problems, accuracy at the top or search engines where only
the top few queries matter. In our previous work, we derived a general
framework including several classes of these linear classification problems. In
this paper, we extend the framework to nonlinear classifiers. Utilizing a
similarity to SVM, we dualize the problems, add kernels and propose a
componentwise dual ascent method. This allows us to perform one iteration in
less than 20 milliseconds on relatively large datasets such as FashionMNIST
A Batch Learning Framework for Scalable Personalized Ranking
In designing personalized ranking algorithms, it is desirable to encourage a
high precision at the top of the ranked list. Existing methods either seek a
smooth convex surrogate for a non-smooth ranking metric or directly modify
updating procedures to encourage top accuracy. In this work we point out that
these methods do not scale well to a large-scale setting, and this is partly
due to the inaccurate pointwise or pairwise rank estimation. We propose a new
framework for personalized ranking. It uses batch-based rank estimators and
smooth rank-sensitive loss functions. This new batch learning framework leads
to more stable and accurate rank approximations compared to previous work.
Moreover, it enables explicit use of parallel computation to speed up training.
We conduct empirical evaluation on three item recommendation tasks. Our method
shows consistent accuracy improvements over state-of-the-art methods.
Additionally, we observe time efficiency advantages when data scale increases.Comment: AAAI 2018, Feb 2-7, New Orleans, US
- …