58,354 research outputs found
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
MirBot: A collaborative object recognition system for smartphones using convolutional neural networks
MirBot is a collaborative application for smartphones that allows users to
perform object recognition. This app can be used to take a photograph of an
object, select the region of interest and obtain the most likely class (dog,
chair, etc.) by means of similarity search using features extracted from a
convolutional neural network (CNN). The answers provided by the system can be
validated by the user so as to improve the results for future queries. All the
images are stored together with a series of metadata, thus enabling a
multimodal incremental dataset labeled with synset identifiers from the WordNet
ontology. This dataset grows continuously thanks to the users' feedback, and is
publicly available for research. This work details the MirBot object
recognition system, analyzes the statistics gathered after more than four years
of usage, describes the image classification methodology, and performs an
exhaustive evaluation using handcrafted features, convolutional neural codes
and different transfer learning techniques. After comparing various models and
transformation methods, the results show that the CNN features maintain the
accuracy of MirBot constant over time, despite the increasing number of new
classes. The app is freely available at the Apple and Google Play stores.Comment: Accepted in Neurocomputing, 201
Evaluation of Output Embeddings for Fine-Grained Image Classification
Image classification has advanced significantly in recent years with the
availability of large-scale image sets. However, fine-grained classification
remains a major challenge due to the annotation cost of large numbers of
fine-grained categories. This project shows that compelling classification
performance can be achieved on such categories even without labeled training
data. Given image and class embeddings, we learn a compatibility function such
that matching embeddings are assigned a higher score than mismatching ones;
zero-shot classification of an image proceeds by finding the label yielding the
highest joint compatibility score. We use state-of-the-art image features and
focus on different supervised attributes and unsupervised output embeddings
either derived from hierarchies or learned from unlabeled text corpora. We
establish a substantially improved state-of-the-art on the Animals with
Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate
that purely unsupervised output embeddings (learned from Wikipedia and improved
with fine-grained text) achieve compelling results, even outperforming the
previous supervised state-of-the-art. By combining different output embeddings,
we further improve results.Comment: @inproceedings {ARWLS15, title = {Evaluation of Output Embeddings for
Fine-Grained Image Classification}, booktitle = {IEEE Computer Vision and
Pattern Recognition}, year = {2015}, author = {Zeynep Akata and Scott Reed
and Daniel Walter and Honglak Lee and Bernt Schiele}
Approximated and User Steerable tSNE for Progressive Visual Analytics
Progressive Visual Analytics aims at improving the interactivity in existing
analytics techniques by means of visualization as well as interaction with
intermediate results. One key method for data analysis is dimensionality
reduction, for example, to produce 2D embeddings that can be visualized and
analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a
well-suited technique for the visualization of several high-dimensional data.
tSNE can create meaningful intermediate results but suffers from a slow
initialization that constrains its application in Progressive Visual Analytics.
We introduce a controllable tSNE approximation (A-tSNE), which trades off speed
and accuracy, to enable interactive data exploration. We offer real-time
visualization techniques, including a density-based solution and a Magic Lens
to inspect the degree of approximation. With this feedback, the user can decide
on local refinements and steer the approximation level during the analysis. We
demonstrate our technique with several datasets, in a real-world research
scenario and for the real-time analysis of high-dimensional streams to
illustrate its effectiveness for interactive data analysis
- …