3,897 research outputs found
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Recognizing and localizing events in videos is a fundamental task for video
understanding. Since events may occur in auditory and visual modalities,
multimodal detailed perception is essential for complete scene comprehension.
Most previous works attempted to analyze videos from a holistic perspective.
However, they do not consider semantic information at multiple scales, which
makes the model difficult to localize events in different lengths. In this
paper, we present a Multimodal Pyramid Attentional Network
(\textbf{MM-Pyramid}) for event localization. Specifically, we first propose
the attentive feature pyramid module. This module captures temporal pyramid
features via several stacking pyramid units, each of them is composed of a
fixed-size attention block and dilated convolution block. We also design an
adaptive semantic fusion module, which leverages a unit-level attention block
and a selective fusion block to integrate pyramid features interactively.
Extensive experiments on audio-visual event localization and weakly-supervised
audio-visual video parsing tasks verify the effectiveness of our approach.Comment: ACM MM 202
Electric-field-induced strong enhancement of electroluminescence in multilayer molybdenum disulfide.
The layered transition metal dichalcogenides have attracted considerable interest for their unique electronic and optical properties. While the monolayer MoS2 exhibits a direct bandgap, the multilayer MoS2 is an indirect bandgap semiconductor and generally optically inactive. Here we report electric-field-induced strong electroluminescence in multilayer MoS2. We show that GaN-Al2O3-MoS2 and GaN-Al2O3-MoS2-Al2O3-graphene vertical heterojunctions can be created with excellent rectification behaviour. Electroluminescence studies demonstrate prominent direct bandgap excitonic emission in multilayer MoS2 over the entire vertical junction area. Importantly, the electroluminescence efficiency observed in multilayer MoS2 is comparable to or higher than that in monolayers. This strong electroluminescence can be attributed to electric-field-induced carrier redistribution from the lowest energy points (indirect bandgap) to higher energy points (direct bandgap) in k-space. The electric-field-induced electroluminescence is general for other layered materials including WSe2 and can open up a new pathway towards transition metal dichalcogenide-based optoelectronic devices
Solving the comfort-retrofit conundrum through post-occupancy evaluation and multi-objective optimisation
Developing appropriate building retrofit strategies is a challenging task. This case study presents a multi-criteria decision-supporting method that suggests optimal solutions and alternative design references with a range of diversity at the early exploration stage in building retrofit. This method employs a practical two-step method to identify critical comfort and energy issues and generate optimised design options with multi-objective optimisation based on a genetic algorithm. The first step is based on a post-occupancy evaluation, which cross-refers benchmarking and correlation and integrates them with non-linear satisfaction theory to extract critical comfort factors. The second step parameterises previous outputs as objectives to conduct building simulation practice. The case study is a typical post-war highly glazed open-plan office in London. The post-occupancy evaluation result identifies direct sunlight glare, indoor temperature, and noise from other occupants as critical comfort factors. The simulation and optimisation extract the optimal retrofit strategies by analysing 480 generated Pareto fronts. The proposed method provides retrofit solutions with a criteria-based filtering method and considers the trade-off between the energy and comfort objectives. The method can be transformed into a design-supporting tool to identify the key comfort factors for built environment optimisation and create sustainability in building retrofit. Practical application : This study suggested that statistical analysis could be integrated with parametric design tools and multi-objective optimisation. It directly links users’ subjective opinions to the final design solutions, suggesting a new method for data-driven generative design. As a quantitative process, the proposed framework could be automated with a program, reducing the human effort in the optimisation process and reducing the reliance on human experience in the design question defining and analysis process. It might also avoid human mistakes, e.g. overlooking some critical factors. During the multi-objective optimisation process, large numbers of design options are generated, and many of them are optimised at the Pareto front. Exploring these options could be a less human effort-intensive process than designing completely new options, especially in the early design exploration phase. Overall, this might be a potential direction for future study in generative design, which greatly reduce the technical obstacle of sustainable design for high building performance.</p
Large area growth and electrical properties of p-type WSe2 atomic layers.
Transition metal dichacogenides represent a unique class of two-dimensional layered materials that can be exfoliated into single or few atomic layers. Tungsten diselenide (WSe(2)) is one typical example with p-type semiconductor characteristics. Bulk WSe(2) has an indirect band gap (∼ 1.2 eV), which transits into a direct band gap (∼ 1.65 eV) in monolayers. Monolayer WSe(2), therefore, is of considerable interest as a new electronic material for functional electronics and optoelectronics. However, the controllable synthesis of large-area WSe(2) atomic layers remains a challenge. The studies on WSe(2) are largely limited by relatively small lateral size of exfoliated flakes and poor yield, which has significantly restricted the large-scale applications of the WSe(2) atomic layers. Here, we report a systematic study of chemical vapor deposition approach for large area growth of atomically thin WSe(2) film with the lateral dimensions up to ∼ 1 cm(2). Microphotoluminescence mapping indicates distinct layer dependent efficiency. The monolayer area exhibits much stronger light emission than bilayer or multilayers, consistent with the expected transition to direct band gap in the monolayer limit. The transmission electron microscopy studies demonstrate excellent crystalline quality of the atomically thin WSe(2). Electrical transport studies further show that the p-type WSe(2) field-effect transistors exhibit excellent electronic characteristics with effective hole carrier mobility up to 100 cm(2) V(-1) s(-1) for monolayer and up to 350 cm(2) V(-1) s(-1) for few-layer materials at room temperature, comparable or well above that of previously reported mobility values for the synthetic WSe(2) and comparable to the best exfoliated materials
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
Weakly-supervised audio-visual violence detection aims to distinguish
snippets containing multimodal violence events with video-level labels. Many
prior works perform audio-visual integration and interaction in an early or
intermediate manner, yet overlooking the modality heterogeneousness over the
weakly-supervised setting. In this paper, we analyze the modality asynchrony
and undifferentiated instances phenomena of the multiple instance learning
(MIL) procedure, and further investigate its negative impact on
weakly-supervised audio-visual learning. To address these issues, we propose a
modality-aware contrastive instance learning with self-distillation (MACIL-SD)
strategy. Specifically, we leverage a lightweight two-stream network to
generate audio and visual bags, in which unimodal background, violent, and
normal instances are clustered into semi-bags in an unsupervised way. Then
audio and visual violent semi-bag representations are assembled as positive
pairs, and violent semi-bags are combined with background and normal instances
in the opposite modality as contrastive negative pairs. Furthermore, a
self-distillation module is applied to transfer unimodal visual knowledge to
the audio-visual model, which alleviates noises and closes the semantic gap
between unimodal and multimodal features. Experiments show that our framework
outperforms previous methods with lower complexity on the large-scale
XD-Violence dataset. Results also demonstrate that our proposed approach can be
used as plug-in modules to enhance other networks. Codes are available at
https://github.com/JustinYuu/MACIL_SD.Comment: ACM MM 202
State of charge estimation for lithium-ion battery based on an intelligent adaptive unscented Kalman filter
- …