196 research outputs found
MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation
We introduce MVControl, a novel neural network architecture that enhances
existing pre-trained multi-view 2D diffusion models by incorporating additional
input conditions, e.g. edge maps. Our approach enables the generation of
controllable multi-view images and view-consistent 3D content. To achieve
controllable multi-view image generation, we leverage MVDream as our base
model, and train a new neural network module as additional plugin for
end-to-end task-specific condition learning. To precisely control the shapes
and views of generated images, we innovatively propose a new conditioning
mechanism that predicts an embedding encapsulating the input spatial and view
conditions, which is then injected to the network globally. Once MVControl is
trained, score-distillation (SDS) loss based optimization can be performed to
generate 3D content, in which process we propose to use a hybrid diffusion
prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and
our trained MVControl for additional guidance. Extensive experiments
demonstrate that our method achieves robust generalization and enables the
controllable generation of high-quality 3D content. Code available at
https://github.com/WU-CVGL/MVControl/.Comment: Project page: https://lizhiqi49.github.io/MVControl
BALF: Simple and Efficient Blur Aware Local Feature Detector
Local feature detection is a key ingredient of many image processing and
computer vision applications, such as visual odometry and localization. Most
existing algorithms focus on feature detection from a sharp image. They would
thus have degraded performance once the image is blurred, which could happen
easily under low-lighting conditions. To address this issue, we propose a
simple yet both efficient and effective keypoint detection method that is able
to accurately localize the salient keypoints in a blurred image. Our method
takes advantages of a novel multi-layer perceptron (MLP) based architecture
that significantly improve the detection repeatability for a blurred image. The
network is also light-weight and able to run in real-time, which enables its
deployment for time-constrained applications. Extensive experimental results
demonstrate that our detector is able to improve the detection repeatability
with blurred images, while keeping comparable performance as existing
state-of-the-art detectors for sharp images
Recommended from our members
Synthesis of Silver Nanowires with Reduced Diameters Using Benzoin-Derived Radicals to Make Transparent Conductors with High Transparency and Low Haze.
Reducing the diameter of silver nanowires has been proven to be an effective way to improve their optoelectronic performance by lessening light attenuation. The state-of-the-art silver nanowires are typically around 20 nm in diameter. Herein we report a modified polyol synthesis of silver nanowires with average diameters as thin as 13 nm and aspect ratios up to 3000. The success of this synthesis is based on the employment of benzoin-derived radicals in the polyol approach and does not require high-pressure conditions. The strong reducing power of radicals allows the reduction of silver precursors to occur at relatively low temperatures, wherein the lateral growth of silver nanowires is restrained because of efficient surface passivation. The optoelectronic performance of as-prepared 13 nm silver nanowires presents a sheet resistance of 28 Ω sq-1 at a transmittance of 95% with a haze factor of ∼1.2%, comparable to that of commercial indium tin oxide (ITO)
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
In real-world applications, users often require both translations and
transcriptions of speech to enhance their comprehension, particularly in
streaming scenarios where incremental generation is necessary. This paper
introduces a streaming Transformer-Transducer that jointly generates automatic
speech recognition (ASR) and speech translation (ST) outputs using a single
decoder. To produce ASR and ST content effectively with minimal latency, we
propose a joint token-level serialized output training method that interleaves
source and target words by leveraging an off-the-shelf textual aligner.
Experiments in monolingual (it-en) and multilingual (\{de,es,it\}-en) settings
demonstrate that our approach achieves the best quality-latency balance. With
an average ASR latency of 1s and ST latency of 1.3s, our model shows no
degradation or even improves output quality compared to separate ASR and ST
models, yielding an average improvement of 1.1 WER and 0.4 BLEU in the
multilingual case
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
The growing need for instant spoken language transcription and translation is
driven by increased global communication and cross-lingual interactions. This
has made offering translations in multiple languages essential for user
applications. Traditional approaches to automatic speech recognition (ASR) and
speech translation (ST) have often relied on separate systems, leading to
inefficiencies in computational resources, and increased synchronization
complexity in real time. In this paper, we propose a streaming
Transformer-Transducer (T-T) model able to jointly produce many-to-one and
one-to-many transcription and translation using a single decoder. We introduce
a novel method for joint token-level serialized output training based on
timestamp information to effectively produce ASR and ST outputs in the
streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our
approach, enabling the generation of one-to-many joint outputs with a single
decoder for the first time.Comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Structural and spectral dynamics of single-crystalline Ruddlesden-Popper phase halide perovskite blue light-emitting diodes.
Achieving perovskite-based high-color purity blue-emitting light-emitting diodes (LEDs) is still challenging. Here, we report successful synthesis of a series of blue-emissive two-dimensional Ruddlesden-Popper phase single crystals and their high-color purity blue-emitting LED demonstrations. Although this approach successfully achieves a series of bandgap emissions based on the different layer thicknesses, it still suffers from a conventional temperature-induced device degradation mechanism during high-voltage operations. To understand the underlying mechanism, we further elucidate temperature-induced device degradation by investigating the crystal structural and spectral evolution dynamics via in situ temperature-dependent single-crystal x-ray diffraction, photoluminescence (PL) characterization, and density functional theory calculation. The PL peak becomes asymmetrically broadened with a marked intensity decay, as temperature increases owing to [PbBr6]4- octahedra tilting and the organic chain disordering, which results in bandgap decrease. This study indicates that careful heat management under LED operation is a key factor to maintain the sharp and intense emission
Distributed topology identification algorithm of distribution network based on neighboring interaction
Intelligent distributed control and protection is a promising route towards flexible and safety operation of distribution network with widespread access of distributed energy resources A fundamental premise of the distributed decision-making is that each smart terminal can identify the topological structure of the feeder and track its changes. This paper proposes a distributed topology identification algorithm with high fault tolerance based on peer-to-peer communication. The smart terminal units (STU) installed on the nodes can dynamiclly track and identify the network topology through local measurement and information exchange with neighboring STUs. The proposed algorithm combines local measurement mutual check with contralateral connectivity predictive correction, and significantly improves the tolerance of measurement errors in topology identification. Test examples are presented to verify the effectiveness of the method
Assessment of multi-source observation merged 1 km-grid precipitation product during the disastrous rainstorms in Guangdong
This paper aims to assess the latest 1 km-grid Analysis Real Time (ART_1 km) precipitation product developed by the National Meteorological Information Center of China Meteorological Administration (CMA), which can provide great support for disaster weather monitoring and warning, intelligent grid forecasting and weather services. Observed precipitation data from the independent stations (including non-uploaded regional meteorological stations and hydrometric stations) that were not integrated into the ART_1 km precipitation product as well as precipitation classification inspection are used to assess the quality of this product during twenty disastrous rainstorm cases from May to August during 2019-2022 in Guangdong. The results show that the ART_1 km precipitation product successfully reproduces the precipitation location, strength, and trends in these cases, with the best performance in the Pearl River Delta, the east of eastern Guangdong, and the north of northern Guangdong. The stronger the precipitation, the greater the correlation as well as the root mean square error (RMSE) and mean error (ME) between the ART_1 km precipitation and the observed precipitation. When the hourly precipitation is not classified, about 60% of these independent stations present a correlation efficient ≥ 0.8, more than 90% of the stations present an RMSE within the range of [1.0, 5.0) mm, and more than 60% of the stations present a ME within ±0.1 mm. When the hourly precipitation is < 5 mm, most of the stations have a correlation efficient < 0.5, an RMSE within the range of [1.0, 5.0) mm, and a ME within [0.0, 0.5] mm. When the hourly precipitation is ≥ 20 mm, 42%~56% of the stations have a correlation efficient ≥ 0.5, and most of the stations have an RMSE ≥ 10 mm and a ME < 0 mm, even when the hourly precipitation is ≥ 50 mm, most of the stations have a ME < -10 mm. Overall, ART_1 km precipitation is usually underestimated at the independent stations, and integrating observations from more sites into producing ART_1 km precipitation is helpful to improve the quality of the products
- …