71,230 research outputs found
DeViL: Decoding Vision features into Language
Post-hoc explanation methods have often been criticised for abstracting away
the decision-making process of deep neural networks. In this work, we would
like to provide natural language descriptions for what different layers of a
vision backbone have learned. Our DeViL method decodes vision features into
language, not only highlighting the attribution locations but also generating
textual descriptions of visual features at different layers of the network. We
train a transformer network to translate individual image features of any
vision layer into a prompt that a separate off-the-shelf language model decodes
into natural language. By employing dropout both per-layer and
per-spatial-location, our model can generalize training on image-text pairs to
generate localized explanations. As it uses a pre-trained language model, our
approach is fast to train, can be applied to any vision backbone, and produces
textual descriptions at different layers of the vision network. Moreover, DeViL
can create open-vocabulary attribution maps corresponding to words or phrases
even outside the training scope of the vision model. We demonstrate that DeViL
generates textual descriptions relevant to the image content on CC3M surpassing
previous lightweight captioning models and attribution maps uncovering the
learned concepts of the vision backbone. Finally, we show DeViL also
outperforms the current state-of-the-art on the neuron-wise descriptions of the
MILANNOTATIONS dataset. Code available at
https://github.com/ExplainableML/DeViLComment: Accepted at GCPR 2023 (Oral
An algorithm for energy-efficient bluetooth scatternet formation and maintenance
We discuss an energy-efficient, distributed Bluetooth Scatternet Formation algorithm based on Device and Link characteristics (SF-DeviL). SF-DeviL forms multihop scatternets with tree topologies and increases battery lif etimes of devices by using device types, battery levels and received signal strengths. The topology is dynamically reconfigured in SF-DeviL by depleting battery levels and it is shown through simulations that the network lifetime is increased by at least 32% compared to LMS algorithm [1]
The Devil is in the Tails: Fine-grained Classification in the Wild
The world is long-tailed. What does this mean for computer vision and visual
recognition? The main two implications are (1) the number of categories we need
to consider in applications can be very large, and (2) the number of training
examples for most categories can be very small. Current visual recognition
algorithms have achieved excellent classification accuracy. However, they
require many training examples to reach peak performance, which suggests that
long-tailed distributions will not be dealt with well. We analyze this question
in the context of eBird, a large fine-grained classification dataset, and a
state-of-the-art deep network classification algorithm. We find that (a) peak
classification performance on well-represented categories is excellent, (b)
given enough data, classification performance suffers only minimally from an
increase in the number of classes, (c) classification performance decays
precipitously as the number of training examples decreases, (d) surprisingly,
transfer learning is virtually absent in current methods. Our findings suggest
that our community should come to grips with the question of long tails
The Devil is in the Decoder: Classification, Regression and GANs
Many machine vision applications, such as semantic segmentation and depth
prediction, require predictions for every pixel of the input image. Models for
such problems usually consist of encoders which decrease spatial resolution
while learning a high-dimensional representation, followed by decoders who
recover the original input resolution and result in low-dimensional
predictions. While encoders have been studied rigorously, relatively few
studies address the decoder side. This paper presents an extensive comparison
of a variety of decoders for a variety of pixel-wise tasks ranging from
classification, regression to synthesis. Our contributions are: (1) Decoders
matter: we observe significant variance in results between different types of
decoders on various problems. (2) We introduce new residual-like connections
for decoders. (3) We introduce a novel decoder: bilinear additive upsampling.
(4) We explore prediction artifacts
Field Measurements of Terrestrial and Martian Dust Devils
Surface-based measurements of terrestrial and martian dust devils/convective vortices provided from mobile and stationary platforms are discussed. Imaging of terrestrial dust devils has quantified their rotational and vertical wind speeds, translation speeds, dimensions, dust load, and frequency of occurrence. Imaging of martian dust devils has provided translation speeds and constraints on dimensions, but only limited constraints on vertical motion within a vortex. The longer mission durations on Mars afforded by long operating robotic landers and rovers have provided statistical quantification of vortex occurrence (time-of-sol, and recently seasonal) that has until recently not been a primary outcome of more temporally limited terrestrial dust devil measurement campaigns. Terrestrial measurement campaigns have included a more extensive range of measured vortex parameters (pressure, wind, morphology, etc.) than have martian opportunities, with electric field and direct measure of dust abundance not yet obtained on Mars. No martian robotic mission has yet provided contemporaneous high frequency wind and pressure measurements. Comparison of measured terrestrial and martian dust devil characteristics suggests that martian dust devils are larger and possess faster maximum rotational wind speeds, that the absolute magnitude of the pressure deficit within a terrestrial dust devil is an order of magnitude greater than a martian dust devil, and that the time-of-day variation in vortex frequency is similar. Recent terrestrial investigations have demonstrated the presence of diagnostic dust devil signals within seismic and infrasound measurements; an upcoming Mars robotic mission will obtain similar measurement types
Spartan Daily September 22, 2010
Volume 135, Issue 13https://scholarworks.sjsu.edu/spartandaily/1176/thumbnail.jp
The Devil of Face Recognition is in the Noise
The growing scale of face recognition datasets empowers us to train strong
convolutional networks for face recognition. While a variety of architectures
and loss functions have been devised, we still have a limited understanding of
the source and consequence of label noise inherent in existing datasets. We
make the following contributions: 1) We contribute cleaned subsets of popular
face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new
large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets
and cleaned subsets, we profile and analyze label noise properties of MegaFace
and MS-Celeb-1M. We show that a few orders more samples are needed to achieve
the same accuracy yielded by a clean subset. 3) We study the association
between different types of noise, i.e., label flips and outliers, with the
accuracy of face recognition models. 4) We investigate ways to improve data
cleanliness, including a comprehensive user study on the influence of data
labeling strategies to annotation accuracy. The IMDb-Face dataset has been
released on https://github.com/fwang91/IMDb-Face.Comment: accepted to ECCV'1
- …