168 research outputs found

    SpatialCodec: Neural Spatial Speech Coding

    Full text link
    In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec. Our approach encompasses two phases: (i) a neural sub-band codec is designed to encode the reference channel with low bit rates, and (ii), a SpatialCodec captures relative spatial information for accurate multi-channel reconstruction at the decoder end. In addition, we also propose novel evaluation metrics to assess the spatial cue preservation: (i) spatial similarity, which calculates cosine similarity on a spatially intuitive beamspace, and (ii), beamformed audio quality. Our system shows superior spatial performance compared with high bitrate baselines and black-box neural architecture. Demos are available at https://xzwy.github.io/SpatialCodecDemo. Codes and models are available at https://github.com/XZWY/SpatialCodec.Comment: Paper in Submissio

    Chaos-assisted two-octave-spanning microcombs

    Get PDF
    Since its invention, optical frequency comb has revolutionized a broad range of subjects from metrology to spectroscopy. The recent development of microresonator-based frequency combs (microcombs) provides a unique pathway to create frequency comb systems on a chip. Indeed, microcomb-based spectroscopy, ranging, optical synthesizer, telecommunications and astronomical calibrations have been reported recently. Critical to many of the integrated comb systems is the broad coverage of comb spectra. Here, microcombs of more than two-octave span (450 nm to 2,008 nm) is demonstrated through χ^((2)) and χ^((3)) nonlinearities in a deformed silica microcavity. The deformation lifts the circular symmetry and creates chaotic tunneling channels that enable broadband collection of intracavity emission with a single waveguide. Our demonstration introduces a new degree of freedom, cavity deformation, to the microcomb studies, and our microcomb spectral range is useful for applications in optical clock, astronomical calibration and biological imaging

    ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

    Full text link
    Recently, Pretrained Language Models (PLMs) have been serving as general-purpose interfaces, posing a significant demand for comprehensive visual knowledge. However, it remains unclear how well current PLMs and their visually augmented counterparts (VaLMs) can master visual commonsense knowledge. To investigate this, we propose ImageNetVC, a fine-grained, human-annotated dataset specifically designed for zero-shot visual commonsense evaluation across 1,000 ImageNet categories. Utilizing ImageNetVC, we delve into the fundamental visual commonsense knowledge of both unimodal PLMs and VaLMs, uncovering the scaling law and the influence of the backbone model on VaLMs. Furthermore, we investigate the factors affecting the visual commonsense knowledge of large-scale models, providing insights into the development of language models enriched with visual commonsense knowledge. Our code and dataset are available at https://github.com/hemingkx/ImageNetVC

    Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

    Full text link
    Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing. Existing deep learning based enhancement methods often struggle to effectively remove background noise and reverberation in real-world scenarios, hampering listening experiences. To address these challenges, we propose a novel approach that uses pre-trained generative methods to resynthesize clean, anechoic speech from degraded inputs. This study leverages pre-trained vocoder or codec models to synthesize high-quality speech while enhancing robustness in challenging scenarios. Generative methods effectively handle information loss in speech signals, resulting in regenerated speech that has improved fidelity and reduced artifacts. By harnessing the capabilities of pre-trained models, we achieve faithful reproduction of the original speech in adverse conditions. Experimental evaluations on both simulated datasets and realistic samples demonstrate the effectiveness and robustness of our proposed methods. Especially by leveraging codec, we achieve superior subjective scores for both simulated and realistic recordings. The generated speech exhibits enhanced audio quality, reduced background noise, and reverberation. Our findings highlight the potential of pre-trained generative techniques in speech processing, particularly in scenarios where traditional methods falter. Demos are available at https://whmrtm.github.io/SoundResynthesis.Comment: Paper in submissio

    Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

    Full text link
    Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with equal contribution

    Knowledge-based planning in robotic intracranial stereotactic radiosurgery treatments

    Get PDF
    PURPOSE: To develop a knowledge-based planning (KBP) model that predicts dosimetric indices and facilitates planning in CyberKnife intracranial stereotactic radiosurgery/radiotherapy (SRS/SRT). METHODS: Forty CyberKnife SRS/SRT plans were retrospectively used to build a linear KBP model which correlated the equivalent radius of the PTV (req_PTV ) and the equivalent radius of volume that receives a set of prescription dose (req_Vi , where Vi = V10% , V20% ... V120% ). To evaluate the model\u27s predictability, a fourfold cross-validation was performed for dosimetric indices such as gradient measure (GM) and brain V50% . The accuracy of the prediction was quantified by the mean and the standard deviation of the difference between planned and predicted values, (i.e., DeltaGM = GMpred - GMclin and fractional DeltaV50% = (V50%pred - V50%clin )/V50%clin ) and a coefficient of determination, R(2) . Then, the KBP model was incorporated into the planning for another 22 clinical cases. The training plans and the KBP test plans were compared in terms of the new conformity index (nCI) as well as the planning efficiency. RESULTS: Our KBP model showed desirable predictability. For the 40 training plans, the average prediction error from cross-validation was only 0.36 +/- 0.06 mm for DeltaGM, and 0.12 +/- 0.08 for DeltaV50% . The R(2) for the linear fit between req_PTV and req_vi was 0.985 +/- 0.019 for isodose volumes ranging from V10% to V120% ; particularly, R(2) = 0.995 for V50% and R(2) = 0.997 for V100% . Compared to the training plans, our KBP test plan nCI was improved from 1.31 +/- 0.15 to 1.15 +/- 0.08 (P \u3c 0.0001). The efficient automatic generation of the optimization constraints by using our model requested no or little planner\u27s intervention. CONCLUSION: We demonstrated a linear KBP based on PTV volumes that accurately predicts CyberKnife SRS/SRT planning dosimetric indices and greatly helps achieve superior plan quality and planning efficiency

    Topography and structural diversity regulate ecosystem multifunctionality in a subtropical evergreen broad-leaved forest

    Get PDF
    Forest functionality is generally considered a byproduct of forest diversity. Perhaps unsurprisingly, many researchers associate increasing multi-functionality with increasing diversity. Diversity, however, is an often-overused word that may describe a host of features, including the diversity of species, functional trait and structure. Furthermore, variable environmental features (such as topography) influence the interaction between forest plants and their function. Incorporating complex topography (like that associated with tropical and subtropical forests) into estimates of forest functionality is challenging and highly uncertain. In this paper, we applied structural equation models to disentangle the relative importance of topography and different components of what might be considered “plant diversity” to forest multifunctionality using repeated census of a 20-ha subtropical forest plot. We found that multifunctionality was principally influenced by structural diversity more so than either species composition or functional trait diversity. In our SEM model approach, we observed variations in topography could account for about 30% of variation in multifunctionality. Furthermore, variations in topography could indirectly influence forest multifunctionality by changing species composition, functional trait diversity, and structural diversity. Our work highlights the importance of topography and forest structure in regulating subtropical forest multifunctionality on the local scale. This suggests future subtropical forest management should focus on regulating forest structure. Namely, our results suggest land managers must take topography (and the complex interaction between topography and plant diversity) into account in order to build robust and multifunctional forests

    Universal isocontours for dissipative Kerr solitons

    Get PDF
    Dissipative Kerr solitons can be generated within an existence region defined on a space of normalized pumping power versus cavity-pump detuning frequency. The contours of constant soliton power and constant pulse width in this region are studied through measurement and simulation. Such isocontours impart structure to the existence region and improve understanding of soliton locking and stabilization methods. As part of the study, dimensionless, closed-form expressions for soliton power and pulse width are developed (including Raman contributions). They provide isocontours in close agreement with those from the full simulation, and, as universal expressions, can simplify the estimation of soliton properties across a wide range of systems
    corecore