79 research outputs found

    Bridging the Granularity Gap for Acoustic Modeling

    Full text link
    While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose \textit{Progressive Down-Sampling} (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20×\times to 1.47×\times. By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.Comment: ACL 2023 Finding

    A Foundation Model for General Moving Object Segmentation in Medical Images

    Full text link
    Medical image segmentation aims to delineate the anatomical or pathological structures of interest, playing a crucial role in clinical diagnosis. A substantial amount of high-quality annotated data is crucial for constructing high-precision deep segmentation models. However, medical annotation is highly cumbersome and time-consuming, especially for medical videos or 3D volumes, due to the huge labeling space and poor inter-frame consistency. Recently, a fundamental task named Moving Object Segmentation (MOS) has made significant advancements in natural images. Its objective is to delineate moving objects from the background within image sequences, requiring only minimal annotations. In this paper, we propose the first foundation model, named iMOS, for MOS in medical images. Extensive experiments on a large multi-modal medical dataset validate the effectiveness of the proposed iMOS. Specifically, with the annotation of only a small number of images in the sequence, iMOS can achieve satisfactory tracking and segmentation performance of moving objects throughout the entire sequence in bi-directions. We hope that the proposed iMOS can help accelerate the annotation speed of experts, and boost the development of medical foundation models.Comment: 5 pages, 7 figures, 3 table

    CTC-based Non-autoregressive Speech Translation

    Full text link
    Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST). In particular, we develop a model consisting of two encoders that are guided by CTC to predict the source and target texts, respectively. Introducing CTC into NAST on both language sides has obvious challenges: 1) the conditional independent generation somewhat breaks the interdependency among tokens, and 2) the monotonic alignment assumption in standard CTC does not hold in translation tasks. In response, we develop a prediction-aware encoding approach and a cross-layer attention approach to address these issues. We also use curriculum learning to improve convergence of training. Experiments on the MuST-C ST benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67×\times, which is comparable to the autoregressive counterpart and even outperforms the previous best result of 0.9 BLEU points.Comment: ACL 2023 Main Conferenc

    OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

    Full text link
    Ultrasound (US) imaging is indispensable in clinical practice. To diagnose certain diseases, sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. However, the limited availability of specific US video cases causes teaching difficulties in identifying corresponding diseases, which potentially impacts the detection rate of such cases. The synthesis of US videos may represent a promising solution to this issue. Nevertheless, it is challenging to accurately animate the intricate motion of dynamic anatomic structures while preserving image fidelity. To address this, we present a novel online feature-decoupling framework called OnUVS for high-fidelity US video synthesis. Our highlights can be summarized by four aspects. First, we introduced anatomic information into keypoint learning through a weakly-supervised training strategy, resulting in improved preservation of anatomical integrity and motion while minimizing the labeling burden. Second, to better preserve the integrity and textural information of US images, we implemented a dual-decoder that decouples the content and textural features in the generator. Third, we adopted a multiple-feature discriminator to extract a comprehensive range of visual cues, thereby enhancing the sharpness and fine details of the generated videos. Fourth, we constrained the motion trajectories of keypoints during online learning to enhance the fluidity of generated videos. Our validation and user studies on in-house echocardiographic and pelvic floor US videos showed that OnUVS synthesizes US videos with high fidelity.Comment: 14 pages, 13 figures and 6 table

    An investigation in the correlation between Ayurvedic body-constitution and food-taste preference

    Get PDF

    The Effect of a Moving Boundary on the Shear Strength of Granular Materials in a Direct Shear Test

    No full text
    The boundary state significantly influences the soil shear strength. Therefore, it is necessary to overcome the limitations of existing indoor test instruments and determine the differences in the shear properties of granular materials to ensure the economic feasibility and mechanical integrity of engineering structures. In this study, the core formula for the direct shear test was derived from the static balancing analysis of the shear box, the external force on the specimen, and the internal force on the shear surface. Three loading methods were then developed by the staggered state of the upper and lower boxes: the upper box moving shear loading method (UM), the lower box moving shear loading method (LM), and the bidirectional moving shear loading method (BM). Finally, by manipulating the motion boundary, the discrete element method (DEM) was employed to simulate the shear test of granular materials. Among the three loading methods, the order of the peak shear stresses was as follows: UM > BM > LM. Moreover, the order of the sample post-peak stress uniformities was as follows: LM > BM > UM. A shear strength conversion formula was then proposed. The findings of this study promote the advancement of the shear mechanics theory of granular materials in direct shear testing and can serve as a scientific basis for the design and manufacture of shear equipment

    Research on Carbon Emission Quota of Railway in China from the Perspective of Equity and Efficiency

    No full text
    Under the constraint of total carbon emissions, the allocation of carbon emission quotas of 18 railway bureaus in China is conducted to the realization of carbon emission reduction targets of China’s railway transportation industry. This paper proposes a carbon emission quota model for China’s railway industry from the perspective of equity and efficiency and innovatively undertakes research on the allocation of carbon emission quotas for railway administrations. This paper constructs an econometric model to analyze the impact of various influencing factors on China’s railway operation carbon emission and predicts the total carbon emission of China’s railway operation from 2021 to 2030 by scenario analysis method. From the perspective of equity and efficiency, apply the entropy method to give weight to historical responsibility, egalitarianism, and efficiency principle to obtain the initial allocation value of the carbon emission quota of the operator’s 18 regional railway bureau groups; the ZSG-DEA model is used to obtain the optimal allocation. The results show that railway passenger turnover, freight turnover, vehicle structure, and per capita GDP have a promoting effect on railway carbon emission, and the proportion of clean energy has an inhibitory effect on carbon emission. There is a gap between the distribution results under the single principle and the comprehensive distribution results; the combination of both can more effectively promote the development of the railway industry. From the perspective of equity and efficiency, the carbon emission quota of 18 railway bureau groups in China is high in the east and low in the west. Among them, the Shanghai railway bureau obtains the most carbon emission quota, while the Qinghai–Tibet railway bureau obtains the least carbon emission quota. The research results provide a reference for the railway bureau to coordinate emission reduction and the construction of the railway transport carbon emission market
    • …
    corecore