3,897 research outputs found

    MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

    Full text link
    Recognizing and localizing events in videos is a fundamental task for video understanding. Since events may occur in auditory and visual modalities, multimodal detailed perception is essential for complete scene comprehension. Most previous works attempted to analyze videos from a holistic perspective. However, they do not consider semantic information at multiple scales, which makes the model difficult to localize events in different lengths. In this paper, we present a Multimodal Pyramid Attentional Network (\textbf{MM-Pyramid}) for event localization. Specifically, we first propose the attentive feature pyramid module. This module captures temporal pyramid features via several stacking pyramid units, each of them is composed of a fixed-size attention block and dilated convolution block. We also design an adaptive semantic fusion module, which leverages a unit-level attention block and a selective fusion block to integrate pyramid features interactively. Extensive experiments on audio-visual event localization and weakly-supervised audio-visual video parsing tasks verify the effectiveness of our approach.Comment: ACM MM 202

    Electric-field-induced strong enhancement of electroluminescence in multilayer molybdenum disulfide.

    Get PDF
    The layered transition metal dichalcogenides have attracted considerable interest for their unique electronic and optical properties. While the monolayer MoS2 exhibits a direct bandgap, the multilayer MoS2 is an indirect bandgap semiconductor and generally optically inactive. Here we report electric-field-induced strong electroluminescence in multilayer MoS2. We show that GaN-Al2O3-MoS2 and GaN-Al2O3-MoS2-Al2O3-graphene vertical heterojunctions can be created with excellent rectification behaviour. Electroluminescence studies demonstrate prominent direct bandgap excitonic emission in multilayer MoS2 over the entire vertical junction area. Importantly, the electroluminescence efficiency observed in multilayer MoS2 is comparable to or higher than that in monolayers. This strong electroluminescence can be attributed to electric-field-induced carrier redistribution from the lowest energy points (indirect bandgap) to higher energy points (direct bandgap) in k-space. The electric-field-induced electroluminescence is general for other layered materials including WSe2 and can open up a new pathway towards transition metal dichalcogenide-based optoelectronic devices

    Solving the comfort-retrofit conundrum through post-occupancy evaluation and multi-objective optimisation

    Get PDF
    Developing appropriate building retrofit strategies is a challenging task. This case study presents a multi-criteria decision-supporting method that suggests optimal solutions and alternative design references with a range of diversity at the early exploration stage in building retrofit. This method employs a practical two-step method to identify critical comfort and energy issues and generate optimised design options with multi-objective optimisation based on a genetic algorithm. The first step is based on a post-occupancy evaluation, which cross-refers benchmarking and correlation and integrates them with non-linear satisfaction theory to extract critical comfort factors. The second step parameterises previous outputs as objectives to conduct building simulation practice. The case study is a typical post-war highly glazed open-plan office in London. The post-occupancy evaluation result identifies direct sunlight glare, indoor temperature, and noise from other occupants as critical comfort factors. The simulation and optimisation extract the optimal retrofit strategies by analysing 480 generated Pareto fronts. The proposed method provides retrofit solutions with a criteria-based filtering method and considers the trade-off between the energy and comfort objectives. The method can be transformed into a design-supporting tool to identify the key comfort factors for built environment optimisation and create sustainability in building retrofit. Practical application : This study suggested that statistical analysis could be integrated with parametric design tools and multi-objective optimisation. It directly links users’ subjective opinions to the final design solutions, suggesting a new method for data-driven generative design. As a quantitative process, the proposed framework could be automated with a program, reducing the human effort in the optimisation process and reducing the reliance on human experience in the design question defining and analysis process. It might also avoid human mistakes, e.g. overlooking some critical factors. During the multi-objective optimisation process, large numbers of design options are generated, and many of them are optimised at the Pareto front. Exploring these options could be a less human effort-intensive process than designing completely new options, especially in the early design exploration phase. Overall, this might be a potential direction for future study in generative design, which greatly reduce the technical obstacle of sustainable design for high building performance.</p

    Large area growth and electrical properties of p-type WSe2 atomic layers.

    Get PDF
    Transition metal dichacogenides represent a unique class of two-dimensional layered materials that can be exfoliated into single or few atomic layers. Tungsten diselenide (WSe(2)) is one typical example with p-type semiconductor characteristics. Bulk WSe(2) has an indirect band gap (∼ 1.2 eV), which transits into a direct band gap (∼ 1.65 eV) in monolayers. Monolayer WSe(2), therefore, is of considerable interest as a new electronic material for functional electronics and optoelectronics. However, the controllable synthesis of large-area WSe(2) atomic layers remains a challenge. The studies on WSe(2) are largely limited by relatively small lateral size of exfoliated flakes and poor yield, which has significantly restricted the large-scale applications of the WSe(2) atomic layers. Here, we report a systematic study of chemical vapor deposition approach for large area growth of atomically thin WSe(2) film with the lateral dimensions up to ∼ 1 cm(2). Microphotoluminescence mapping indicates distinct layer dependent efficiency. The monolayer area exhibits much stronger light emission than bilayer or multilayers, consistent with the expected transition to direct band gap in the monolayer limit. The transmission electron microscopy studies demonstrate excellent crystalline quality of the atomically thin WSe(2). Electrical transport studies further show that the p-type WSe(2) field-effect transistors exhibit excellent electronic characteristics with effective hole carrier mobility up to 100 cm(2) V(-1) s(-1) for monolayer and up to 350 cm(2) V(-1) s(-1) for few-layer materials at room temperature, comparable or well above that of previously reported mobility values for the synthetic WSe(2) and comparable to the best exfoliated materials

    Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

    Full text link
    Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal violence events with video-level labels. Many prior works perform audio-visual integration and interaction in an early or intermediate manner, yet overlooking the modality heterogeneousness over the weakly-supervised setting. In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning. To address these issues, we propose a modality-aware contrastive instance learning with self-distillation (MACIL-SD) strategy. Specifically, we leverage a lightweight two-stream network to generate audio and visual bags, in which unimodal background, violent, and normal instances are clustered into semi-bags in an unsupervised way. Then audio and visual violent semi-bag representations are assembled as positive pairs, and violent semi-bags are combined with background and normal instances in the opposite modality as contrastive negative pairs. Furthermore, a self-distillation module is applied to transfer unimodal visual knowledge to the audio-visual model, which alleviates noises and closes the semantic gap between unimodal and multimodal features. Experiments show that our framework outperforms previous methods with lower complexity on the large-scale XD-Violence dataset. Results also demonstrate that our proposed approach can be used as plug-in modules to enhance other networks. Codes are available at https://github.com/JustinYuu/MACIL_SD.Comment: ACM MM 202
    • …
    corecore