4,492 research outputs found
Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Large-scale sound recognition data sets typically consist of acoustic
recordings obtained from multimedia libraries. As a consequence, modalities
other than audio can often be exploited to improve the outputs of models
designed for associated tasks. Frequently, however, not all contents are
available for all samples of such a collection: For example, the original
material may have been removed from the source platform at some point, and
therefore, non-auditory features can no longer be acquired.
We demonstrate that a multi-encoder framework can be employed to deal with
this issue by applying this method to attention-based deep learning systems,
which are currently part of the state of the art in the domain of sound
recognition. More specifically, we show that the proposed model extension can
successfully be utilized to incorporate partially available visual information
into the operational procedures of such networks, which normally only use
auditory features during training and inference. Experimentally, we verify that
the considered approach leads to improved predictions in a number of evaluation
scenarios pertaining to audio tagging and sound event detection. Additionally,
we scrutinize some properties and limitations of the presented technique.Comment: Submitted to EURASIP Journal on Audio, Speech, and Music Processin
Sound-based transportation mode recognition with smartphones
Smartphone-based identification of the mode of transportation of the user is important for context-aware services. We investigate the feasibility of recognizing the 8 most common modes of locomotion and transportation from the sound recorded by a smartphone carried by the user. We propose a convolutional neural network based recognition pipeline, which operates on the short- time Fourier transform (STFT) spectrogram of the sound in the log domain. Experiment with the Sussex-Huawei locomotion- transportation (SHL) dataset on 366 hours of data shows promising results where the proposed pipeline can recognize the activities Still, Walk, Run, Bike, Car, Bus, Train and Subway with a global accuracy of 86.6%, which is 23% higher than classical machine learning pipelines. It is shown that sound is particularly useful for distinguishing between various vehicle activities (e.g. Car vs Bus, Train vs Subway). This discriminablity is complementary to the widely used motion sensors, which are poor at distinguish between rail and road transport
Exclusion Limits on the WIMP-Nucleon Cross-Section from the First Run of the Cryogenic Dark Matter Search in the Soudan Underground Lab
The Cryogenic Dark Matter Search (CDMS-II) employs low-temperature Ge and Si
detectors to seek Weakly Interacting Massive Particles (WIMPs) via their
elastic scattering interactions with nuclei. Simultaneous measurements of both
ionization and phonon energy provide discrimination against interactions of
background particles. For recoil energies above 10 keV, events due to
background photons are rejected with >99.99% efficiency. Electromagnetic events
very near the detector surface can mimic nuclear recoils because of reduced
charge collection, but these surface events are rejected with >96% efficiency
by using additional information from the phonon pulse shape. Efficient use of
active and passive shielding, combined with the the 2090 m.w.e. overburden at
the experimental site in the Soudan mine, makes the background from neutrons
negligible for this first exposure. All cuts are determined in a blind manner
from in situ calibrations with external radioactive sources without any prior
knowledge of the event distribution in the signal region. Resulting
efficiencies are known to ~10%. A single event with a recoil of 64 keV passes
all of the cuts and is consistent with the expected misidentification rate of
surface-electron recoils. Under the assumptions for a standard dark matter
halo, these data exclude previously unexplored parameter space for both
spin-independent and spin-dependent WIMP-nucleon elastic scattering. The
resulting limit on the spin-independent WIMP-nucleon elastic-scattering
cross-section has a minimum of 4x10^-43 cm^2 at a WIMP mass of 60 GeV/c^2. The
minimum of the limit for the spin-dependent WIMP-neutron elastic-scattering
cross-section is 2x10^-37 cm^2 at a WIMP mass of 50 GeV/c^2.Comment: 37 pages, 42 figure
A novel double-hybrid learning method for modal frequency-based damage assessment of bridge structures under different environmental variation patterns
Monitoring of modal frequencies under an unsupervised learning framework is a practical strategy for damage assessment of civil structures, especially bridges. However, the key challenge is related to high sensitivity of modal frequencies to environmental and/or operational changes that may lead to economic and safety losses. The other challenge pertains to different environmental and/or operational variation patterns in modal frequencies due to differences in structural types, materials, and applications, measurement periods in terms of short and/or long monitoring programs, geographical locations of structures, weather conditions, and influences of single or multiple environmental and/or operational factors, which may cause barriers to employing stateof-the-art unsupervised learning approaches. To cope with these issues, this paper proposes a novel double-hybrid learning technique in an unsupervised manner. It contains two stages of data partitioning and anomaly detection, both of which comprise two hybrid algorithms. For the first stage, an improved hybrid clustering method based on a coupling of shared nearest neighbor searching and density peaks clustering is proposed to prepare local information for anomaly detection with the focus on mitigating environmental and/or operational effects. For the second stage, this paper proposes an innovative non-parametric hybrid anomaly detector based on local outlier factor. In both stages, the number of nearest neighbors is the key hyperparameter that is automatically determined by leveraging a self-adaptive neighbor searching algorithm. Modal frequencies of two full-scale bridges are utilized to validate the proposed technique with several comparisons. Results indicate that this technique is able to successfully eliminate different environmental and/or operational variations and correctly detect damage
- …