6,344 research outputs found
RoomNet: End-to-End Room Layout Estimation
This paper focuses on the task of room layout estimation from a monocular RGB
image. Prior works break the problem into two sub-tasks: semantic segmentation
of floor, walls, ceiling to produce layout hypotheses, followed by an iterative
optimization step to rank these hypotheses. In contrast, we adopt a more direct
formulation of this problem as one of estimating an ordered set of room layout
keypoints. The room layout and the corresponding segmentation is completely
specified given the locations of these ordered keypoints. We predict the
locations of the room layout keypoints using RoomNet, an end-to-end trainable
encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN,
we achieve state-of-the-art performance along with 200x to 600x speedup
compared to the most recent work. Additionally, we present optional extensions
to the RoomNet architecture such as including recurrent computations and memory
units to refine the keypoint locations under the same parametric capacity.Comment: accepted at ICCV 201
Tracking shocked dust: state estimation for a complex plasma during a shock wave
We consider a two-dimensional complex (dusty) plasma crystal excited by an
electrostatically-induced shock wave. Dust particle kinematics in such a system
are usually determined using particle tracking velocimetry. In this work we
present a particle tracking algorithm which determines the dust particle
kinematics with significantly higher accuracy than particle tracking
velocimetry. The algorithm uses multiple extended Kalman filters to estimate
the particle states and an interacting multiple model to assign probabilities
to the different filters. This enables the determination of relevant physical
properties of the dust, such as kinetic energy and kinetic temperature, with
high precision. We use a Hugoniot shock-jump relation to calculate a
pressure-volume diagram from the shocked dust kinematics. Calculation of the
full pressure-volume diagram was possible with our tracking algorithm, but not
with particle tracking velocimetry.Comment: 10 pages, 8 figures, accepted for publication in Physics of Plasma
Machine learning in acoustics: theory and applications
Acoustic data provide scientific and engineering insights in fields ranging
from biology and communications to ocean and Earth science. We survey the
recent advances and transformative potential of machine learning (ML),
including deep learning, in the field of acoustics. ML is a broad family of
techniques, which are often based in statistics, for automatically detecting
and utilizing patterns in data. Relative to conventional acoustics and signal
processing, ML is data-driven. Given sufficient training data, ML can discover
complex relationships between features and desired labels or actions, or
between features themselves. With large volumes of training data, ML can
discover models describing complex acoustic phenomena such as human speech and
reverberation. ML in acoustics is rapidly developing with compelling results
and significant future promise. We first introduce ML, then highlight ML
developments in four acoustics research areas: source localization in speech
processing, source localization in ocean acoustics, bioacoustics, and
environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of
America, 27 Nov. 201
Machine learning based hyperspectral image analysis: A survey
Hyperspectral sensors enable the study of the chemical properties of scene
materials remotely for the purpose of identification, detection, and chemical
composition analysis of objects in the environment. Hence, hyperspectral images
captured from earth observing satellites and aircraft have been increasingly
important in agriculture, environmental monitoring, urban planning, mining, and
defense. Machine learning algorithms due to their outstanding predictive power
have become a key tool for modern hyperspectral image analysis. Therefore, a
solid understanding of machine learning techniques have become essential for
remote sensing researchers and practitioners. This paper reviews and compares
recent machine learning-based hyperspectral image analysis methods published in
literature. We organize the methods by the image analysis task and by the type
of machine learning algorithm, and present a two-way mapping between the image
analysis tasks and the types of machine learning algorithms that can be applied
to them. The paper is comprehensive in coverage of both hyperspectral image
analysis tasks and machine learning algorithms. The image analysis tasks
considered are land cover classification, target detection, unmixing, and
physical parameter estimation. The machine learning algorithms covered are
Gaussian models, linear regression, logistic regression, support vector
machines, Gaussian mixture model, latent linear models, sparse linear models,
Gaussian mixture models, ensemble learning, directed graphical models,
undirected graphical models, clustering, Gaussian processes, Dirichlet
processes, and deep learning. We also discuss the open challenges in the field
of hyperspectral image analysis and explore possible future directions
Audio Surveillance: a Systematic Review
Despite surveillance systems are becoming increasingly ubiquitous in our
living environment, automated surveillance, currently based on video sensory
modality and machine intelligence, lacks most of the time the robustness and
reliability required in several real applications. To tackle this issue, audio
sensory devices have been taken into account, both alone or in combination with
video, giving birth, in the last decade, to a considerable amount of research.
In this paper audio-based automated surveillance methods are organized into a
comprehensive survey: a general taxonomy, inspired by the more widespread video
surveillance field, is proposed in order to systematically describe the methods
covering background subtraction, event classification, object tracking and
situation analysis. For each of these tasks, all the significant works are
reviewed, detailing their pros and cons and the context for which they have
been proposed. Moreover, a specific section is devoted to audio features,
discussing their expressiveness and their employment in the above described
tasks. Differently, from other surveys on audio processing and analysis, the
present one is specifically targeted to automated surveillance, highlighting
the target applications of each described methods and providing the reader
tables and schemes useful to retrieve the most suited algorithms for a specific
requirement
Spatio-temporal Video Parsing for Abnormality Detection
Abnormality detection in video poses particular challenges due to the
infinite size of the class of all irregular objects and behaviors. Thus no (or
by far not enough) abnormal training samples are available and we need to find
abnormalities in test data without actually knowing what they are.
Nevertheless, the prevailing concept of the field is to directly search for
individual abnormal local patches or image regions independent of another. To
address this problem, we propose a method for joint detection of abnormalities
in videos by spatio-temporal video parsing. The goal of video parsing is to
find a set of indispensable normal spatio-temporal object hypotheses that
jointly explain all the foreground of a video, while, at the same time, being
supported by normal training samples. Consequently, we avoid a direct detection
of abnormalities and discover them indirectly as those hypotheses which are
needed for covering the foreground without finding an explanation for
themselves by normal samples. Abnormalities are localized by MAP inference in a
graphical model and we solve it efficiently by formulating it as a convex
optimization problem. We experimentally evaluate our approach on several
challenging benchmark sets, improving over the state-of-the-art on all standard
benchmarks both in terms of abnormality classification and localization.Comment: 15 pages, 12 figures, 3 table
2CoBel : An Efficient Belief Function Extension for Two-dimensional Continuous Spaces
This paper introduces an innovative approach for handling 2D compound
hypotheses within the Belief Function Theory framework. We propose a
polygon-based generic rep- resentation which relies on polygon clipping
operators. This approach allows us to account in the computational cost for the
precision of the representation independently of the cardinality of the
discernment frame. For the BBA combination and decision making, we propose
efficient algorithms which rely on hashes for fast lookup, and on a topological
ordering of the focal elements within a directed acyclic graph encoding their
interconnections. Additionally, an implementation of the functionalities
proposed in this paper is provided as an open source library. Experimental
results on a pedestrian localization problem are reported. The experiments show
that the solution is accurate and that it fully benefits from the scalability
of the 2D search space granularity provided by our representation
On Testing Machine Learning Programs
Nowadays, we are witnessing a wide adoption of Machine learning (ML) models
in many safety-critical systems, thanks to recent breakthroughs in deep
learning and reinforcement learning. Many people are now interacting with
systems based on ML every day, e.g., voice recognition systems used by virtual
personal assistants like Amazon Alexa or Google Home. As the field of ML
continues to grow, we are likely to witness transformative advances in a wide
range of areas, from finance, energy, to health and transportation. Given this
growing importance of ML-based systems in our daily life, it is becoming
utterly important to ensure their reliability. Recently, software researchers
have started adapting concepts from the software testing domain (e.g., code
coverage, mutation testing, or property-based testing) to help ML engineers
detect and correct faults in ML programs. This paper reviews current existing
testing practices for ML programs. First, we identify and explain challenges
that should be addressed when testing ML programs. Next, we report existing
solutions found in the literature for testing ML programs. Finally, we identify
gaps in the literature related to the testing of ML programs and make
recommendations of future research directions for the scientific community. We
hope that this comprehensive review of software testing practices will help ML
engineers identify the right approach to improve the reliability of their
ML-based systems. We also hope that the research community will act on our
proposed research directions to advance the state of the art of testing for ML
programs.Comment: This manuscript is part of a submission to the Journal of Systems and
Softwar
Joining Sound Event Detection and Localization Through Spatial Segregation
Identification and localization of sounds are both integral parts of
computational auditory scene analysis. Although each can be solved separately,
the goal of forming coherent auditory objects and achieving a comprehensive
spatial scene understanding suggests pursuing a joint solution of the two
problems. This work presents an approach that robustly binds localization with
the detection of sound events in a binaural robotic system. Both tasks are
joined through the use of spatial stream segregation which produces
probabilistic time-frequency masks for individual sources attributable to
separate locations, enabling segregated sound event detection operating on
these streams. We use simulations of a comprehensive suite of test scenes with
multiple co-occurring sound sources, and propose performance measures for
systematic investigation of the impact of scene complexity on this segregated
detection of sound types. Analyzing the effect of spatial scene arrangement, we
show how a robot could facilitate high performance through optimal head
rotation. Furthermore, we investigate the performance of segregated detection
given possible localization error as well as error in the estimation of number
of active sources. Our analysis demonstrates that the proposed approach is an
effective method to obtain joint sound event location and type information
under a wide range of conditions.Comment: Accepted for publication in IEEE/ACM Transactions on Audio, Speech,
and Language Processin
High throughput quantitative metallography for complex microstructures using deep learning: A case study in ultrahigh carbon steel
We apply a deep convolutional neural network segmentation model to enable
novel automated microstructure segmentation applications for complex
microstructures typically evaluated manually and subjectively. We explore two
microstructure segmentation tasks in an openly-available ultrahigh carbon steel
microstructure dataset: segmenting cementite particles in the spheroidized
matrix, and segmenting larger fields of view featuring grain boundary carbide,
spheroidized particle matrix, particle-free grain boundary denuded zone, and
Widmanst\"atten cementite. We also demonstrate how to combine these data-driven
microstructure segmentation models to obtain empirical cementite particle size
and denuded zone width distributions from more complex micrographs containing
multiple microconstituents. The full annotated dataset is available on
materialsdata.nist.gov (https://materialsdata.nist.gov/handle/11256/964).Comment: Updated with minor revisions reflecting the review process at
Microscopy and Microanalysis. Full supplementary materials will be available
at https://holmgroup.github.io/publications
- …