196 research outputs found

    MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

    Full text link
    We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable multi-view images and view-consistent 3D content. To achieve controllable multi-view image generation, we leverage MVDream as our base model, and train a new neural network module as additional plugin for end-to-end task-specific condition learning. To precisely control the shapes and views of generated images, we innovatively propose a new conditioning mechanism that predicts an embedding encapsulating the input spatial and view conditions, which is then injected to the network globally. Once MVControl is trained, score-distillation (SDS) loss based optimization can be performed to generate 3D content, in which process we propose to use a hybrid diffusion prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and our trained MVControl for additional guidance. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. Code available at https://github.com/WU-CVGL/MVControl/.Comment: Project page: https://lizhiqi49.github.io/MVControl

    BALF: Simple and Efficient Blur Aware Local Feature Detector

    Full text link
    Local feature detection is a key ingredient of many image processing and computer vision applications, such as visual odometry and localization. Most existing algorithms focus on feature detection from a sharp image. They would thus have degraded performance once the image is blurred, which could happen easily under low-lighting conditions. To address this issue, we propose a simple yet both efficient and effective keypoint detection method that is able to accurately localize the salient keypoints in a blurred image. Our method takes advantages of a novel multi-layer perceptron (MLP) based architecture that significantly improve the detection repeatability for a blurred image. The network is also light-weight and able to run in real-time, which enables its deployment for time-constrained applications. Extensive experimental results demonstrate that our detector is able to improve the detection repeatability with blurred images, while keeping comparable performance as existing state-of-the-art detectors for sharp images

    Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

    Full text link
    In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transformer-Transducer that jointly generates automatic speech recognition (ASR) and speech translation (ST) outputs using a single decoder. To produce ASR and ST content effectively with minimal latency, we propose a joint token-level serialized output training method that interleaves source and target words by leveraging an off-the-shelf textual aligner. Experiments in monolingual (it-en) and multilingual (\{de,es,it\}-en) settings demonstrate that our approach achieves the best quality-latency balance. With an average ASR latency of 1s and ST latency of 1.3s, our model shows no degradation or even improves output quality compared to separate ASR and ST models, yielding an average improvement of 1.1 WER and 0.4 BLEU in the multilingual case

    Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

    Full text link
    The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in computational resources, and increased synchronization complexity in real time. In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. We introduce a novel method for joint token-level serialized output training based on timestamp information to effectively produce ASR and ST outputs in the streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.Comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Structural and spectral dynamics of single-crystalline Ruddlesden-Popper phase halide perovskite blue light-emitting diodes.

    Get PDF
    Achieving perovskite-based high-color purity blue-emitting light-emitting diodes (LEDs) is still challenging. Here, we report successful synthesis of a series of blue-emissive two-dimensional Ruddlesden-Popper phase single crystals and their high-color purity blue-emitting LED demonstrations. Although this approach successfully achieves a series of bandgap emissions based on the different layer thicknesses, it still suffers from a conventional temperature-induced device degradation mechanism during high-voltage operations. To understand the underlying mechanism, we further elucidate temperature-induced device degradation by investigating the crystal structural and spectral evolution dynamics via in situ temperature-dependent single-crystal x-ray diffraction, photoluminescence (PL) characterization, and density functional theory calculation. The PL peak becomes asymmetrically broadened with a marked intensity decay, as temperature increases owing to [PbBr6]4- octahedra tilting and the organic chain disordering, which results in bandgap decrease. This study indicates that careful heat management under LED operation is a key factor to maintain the sharp and intense emission

    Distributed topology identification algorithm of distribution network based on neighboring interaction

    Get PDF
    Intelligent distributed control and protection is a promising route towards flexible and safety operation of distribution network with widespread access of distributed energy resources A fundamental premise of the distributed decision-making is that each smart terminal can identify the topological structure of the feeder and track its changes. This paper proposes a distributed topology identification algorithm with high fault tolerance based on peer-to-peer communication. The smart terminal units (STU) installed on the nodes can dynamiclly track and identify the network topology through local measurement and information exchange with neighboring STUs. The proposed algorithm combines local measurement mutual check with contralateral connectivity predictive correction, and significantly improves the tolerance of measurement errors in topology identification. Test examples are presented to verify the effectiveness of the method

    Assessment of multi-source observation merged 1 km-grid precipitation product during the disastrous rainstorms in Guangdong

    Get PDF
    This paper aims to assess the latest 1 km-grid Analysis Real Time (ART_1 km) precipitation product developed by the National Meteorological Information Center of China Meteorological Administration (CMA), which can provide great support for disaster weather monitoring and warning, intelligent grid forecasting and weather services. Observed precipitation data from the independent stations (including non-uploaded regional meteorological stations and hydrometric stations) that were not integrated into the ART_1 km precipitation product as well as precipitation classification inspection are used to assess the quality of this product during twenty disastrous rainstorm cases from May to August during 2019-2022 in Guangdong. The results show that the ART_1 km precipitation product successfully reproduces the precipitation location, strength, and trends in these cases, with the best performance in the Pearl River Delta, the east of eastern Guangdong, and the north of northern Guangdong. The stronger the precipitation, the greater the correlation as well as the root mean square error (RMSE) and mean error (ME) between the ART_1 km precipitation and the observed precipitation. When the hourly precipitation is not classified, about 60% of these independent stations present a correlation efficient ≥ 0.8, more than 90% of the stations present an RMSE within the range of [1.0, 5.0) mm, and more than 60% of the stations present a ME within ±0.1 mm. When the hourly precipitation is < 5 mm, most of the stations have a correlation efficient < 0.5, an RMSE within the range of [1.0, 5.0) mm, and a ME within [0.0, 0.5] mm. When the hourly precipitation is ≥ 20 mm, 42%~56% of the stations have a correlation efficient ≥ 0.5, and most of the stations have an RMSE ≥ 10 mm and a ME < 0 mm, even when the hourly precipitation is ≥ 50 mm, most of the stations have a ME < -10 mm. Overall, ART_1 km precipitation is usually underestimated at the independent stations, and integrating observations from more sites into producing ART_1 km precipitation is helpful to improve the quality of the products
    • …
    corecore