24,767 research outputs found
Cascade evaluation of clustering algorithm
International audienceThis paper is about the evaluation of the results of clustering algorithms, and the comparison of such algorithms. We propose a new method based on the enrichment of a set of independent labeled datasets by the results of clustering, and the use of a supervised method to evaluate the interest of adding such new information to the datasets. We thus adapt the cascade generalization paradigm in the case where we combine an unsupervised and a supervised learner. We also consider the case where independent supervised learnings are performed on the different groups of data objects created by the clustering. We then conduct experiments using different supervised algorithms to compare various clustering algorithms. And we thus show that our proposed method exhibits a coherent behavior, pointing out, for example, that the algorithms based on the use of complex probabilistic models outperform algorithms based on the use of simpler models
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval
In Online Learning to Rank (OLTR) the aim is to find an optimal ranking model
by interacting with users. When learning from user behavior, systems must
interact with users while simultaneously learning from those interactions.
Unlike other Learning to Rank (LTR) settings, existing research in this field
has been limited to linear models. This is due to the speed-quality tradeoff
that arises when selecting models: complex models are more expressive and can
find the best rankings but need more user interactions to do so, a requirement
that risks frustrating users during training. Conversely, simpler models can be
optimized on fewer interactions and thus provide a better user experience, but
they will converge towards suboptimal rankings. This tradeoff creates a
deadlock, since novel models will not be able to improve either the user
experience or the final convergence point, without sacrificing the other. Our
contribution is twofold. First, we introduce a fast OLTR model called Sim-MGD
that addresses the speed aspect of the speed-quality tradeoff. Sim-MGD ranks
documents based on similarities with reference documents. It converges rapidly
and, hence, gives a better user experience but it does not converge towards the
optimal rankings. Second, we contribute Cascading Multileave Gradient Descent
(C-MGD) for OLTR that directly addresses the speed-quality tradeoff by using a
cascade that enables combinations of the best of two worlds: fast learning and
high quality final convergence. C-MGD can provide the better user experience of
Sim-MGD while maintaining the same convergence as the state-of-the-art MGD
model. This opens the door for future work to design new models for OLTR
without having to deal with the speed-quality tradeoff.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
Rejection-Cascade of Gaussians: Real-time adaptive background subtraction framework
Background-Foreground classification is a well-studied problem in computer
vision. Due to the pixel-wise nature of modeling and processing in the
algorithm, it is usually difficult to satisfy real-time constraints. There is a
trade-off between the speed (because of model complexity) and accuracy.
Inspired by the rejection cascade of Viola-Jones classifier, we decompose the
Gaussian Mixture Model (GMM) into an adaptive cascade of Gaussians(CoG). We
achieve a good improvement in speed without compromising the accuracy with
respect to the baseline GMM model. We demonstrate a speed-up factor of 4-5x and
17 percent average improvement in accuracy over Wallflowers surveillance
datasets. The CoG is then demonstrated to over the latent space representation
of images of a convolutional variational autoencoder(VAE). We provide initial
results over CDW-2014 dataset, which could speed up background subtraction for
deep architectures.Comment: Accepted for National Conference on Computer Vision, Pattern
Recognition, Image Processing and Graphics (NCVPRIPG 2019
Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data
Microbial identification is a central issue in microbiology, in particular in
the fields of infectious diseases diagnosis and industrial quality control. The
concept of species is tightly linked to the concept of biological and clinical
classification where the proximity between species is generally measured in
terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the
information provided by this well-known hierarchical structure is rarely used
by machine learning-based automatic microbial identification systems.
Structured machine learning methods were recently proposed for taking into
account the structure embedded in a hierarchy and using it as additional a
priori information, and could therefore allow to improve microbial
identification systems. We test and compare several state-of-the-art machine
learning methods for microbial identification on a new Matrix-Assisted Laser
Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset.
We include in the benchmark standard and structured methods, that leverage the
knowledge of the underlying hierarchical structure in the learning process. Our
results show that although some methods perform better than others, structured
methods do not consistently perform better than their "flat" counterparts. We
postulate that this is partly due to the fact that standard methods already
reach a high level of accuracy in this context, and that they mainly confuse
species close to each other in the tree, a case where using the known hierarchy
is not helpful
- …