79 research outputs found
Bridging the Granularity Gap for Acoustic Modeling
While Transformer has become the de-facto standard for speech, modeling upon
the fine-grained frame-level features remains an open challenge of capturing
long-distance dependencies and distributing the attention weights. We propose
\textit{Progressive Down-Sampling} (PDS) which gradually compresses the
acoustic features into coarser-grained units containing more complete semantic
information, like text-level representation. In addition, we develop a
representation fusion method to alleviate information loss that occurs
inevitably during high compression. In this way, we compress the acoustic
features into 1/32 of the initial length while achieving better or comparable
performances on the speech recognition task. And as a bonus, it yields
inference speedups ranging from 1.20 to 1.47. By reducing the
modeling burden, we also achieve competitive results when training on the more
challenging speech translation task.Comment: ACL 2023 Finding
A Foundation Model for General Moving Object Segmentation in Medical Images
Medical image segmentation aims to delineate the anatomical or pathological
structures of interest, playing a crucial role in clinical diagnosis. A
substantial amount of high-quality annotated data is crucial for constructing
high-precision deep segmentation models. However, medical annotation is highly
cumbersome and time-consuming, especially for medical videos or 3D volumes, due
to the huge labeling space and poor inter-frame consistency. Recently, a
fundamental task named Moving Object Segmentation (MOS) has made significant
advancements in natural images. Its objective is to delineate moving objects
from the background within image sequences, requiring only minimal annotations.
In this paper, we propose the first foundation model, named iMOS, for MOS in
medical images. Extensive experiments on a large multi-modal medical dataset
validate the effectiveness of the proposed iMOS. Specifically, with the
annotation of only a small number of images in the sequence, iMOS can achieve
satisfactory tracking and segmentation performance of moving objects throughout
the entire sequence in bi-directions. We hope that the proposed iMOS can help
accelerate the annotation speed of experts, and boost the development of
medical foundation models.Comment: 5 pages, 7 figures, 3 table
CTC-based Non-autoregressive Speech Translation
Combining end-to-end speech translation (ST) and non-autoregressive (NAR)
generation is promising in language and speech processing for their advantages
of less error propagation and low latency. In this paper, we investigate the
potential of connectionist temporal classification (CTC) for non-autoregressive
speech translation (NAST). In particular, we develop a model consisting of two
encoders that are guided by CTC to predict the source and target texts,
respectively. Introducing CTC into NAST on both language sides has obvious
challenges: 1) the conditional independent generation somewhat breaks the
interdependency among tokens, and 2) the monotonic alignment assumption in
standard CTC does not hold in translation tasks. In response, we develop a
prediction-aware encoding approach and a cross-layer attention approach to
address these issues. We also use curriculum learning to improve convergence of
training. Experiments on the MuST-C ST benchmarks show that our NAST model
achieves an average BLEU score of 29.5 with a speed-up of 5.67, which
is comparable to the autoregressive counterpart and even outperforms the
previous best result of 0.9 BLEU points.Comment: ACL 2023 Main Conferenc
OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis
Ultrasound (US) imaging is indispensable in clinical practice. To diagnose
certain diseases, sonographers must observe corresponding dynamic anatomic
structures to gather comprehensive information. However, the limited
availability of specific US video cases causes teaching difficulties in
identifying corresponding diseases, which potentially impacts the detection
rate of such cases. The synthesis of US videos may represent a promising
solution to this issue. Nevertheless, it is challenging to accurately animate
the intricate motion of dynamic anatomic structures while preserving image
fidelity. To address this, we present a novel online feature-decoupling
framework called OnUVS for high-fidelity US video synthesis. Our highlights can
be summarized by four aspects. First, we introduced anatomic information into
keypoint learning through a weakly-supervised training strategy, resulting in
improved preservation of anatomical integrity and motion while minimizing the
labeling burden. Second, to better preserve the integrity and textural
information of US images, we implemented a dual-decoder that decouples the
content and textural features in the generator. Third, we adopted a
multiple-feature discriminator to extract a comprehensive range of visual cues,
thereby enhancing the sharpness and fine details of the generated videos.
Fourth, we constrained the motion trajectories of keypoints during online
learning to enhance the fluidity of generated videos. Our validation and user
studies on in-house echocardiographic and pelvic floor US videos showed that
OnUVS synthesizes US videos with high fidelity.Comment: 14 pages, 13 figures and 6 table
An investigation in the correlation between Ayurvedic body-constitution and food-taste preference
The Effect of a Moving Boundary on the Shear Strength of Granular Materials in a Direct Shear Test
The boundary state significantly influences the soil shear strength. Therefore, it is necessary to overcome the limitations of existing indoor test instruments and determine the differences in the shear properties of granular materials to ensure the economic feasibility and mechanical integrity of engineering structures. In this study, the core formula for the direct shear test was derived from the static balancing analysis of the shear box, the external force on the specimen, and the internal force on the shear surface. Three loading methods were then developed by the staggered state of the upper and lower boxes: the upper box moving shear loading method (UM), the lower box moving shear loading method (LM), and the bidirectional moving shear loading method (BM). Finally, by manipulating the motion boundary, the discrete element method (DEM) was employed to simulate the shear test of granular materials. Among the three loading methods, the order of the peak shear stresses was as follows: UM > BM > LM. Moreover, the order of the sample post-peak stress uniformities was as follows: LM > BM > UM. A shear strength conversion formula was then proposed. The findings of this study promote the advancement of the shear mechanics theory of granular materials in direct shear testing and can serve as a scientific basis for the design and manufacture of shear equipment
Research on Carbon Emission Quota of Railway in China from the Perspective of Equity and Efficiency
Under the constraint of total carbon emissions, the allocation of carbon emission quotas of 18 railway bureaus in China is conducted to the realization of carbon emission reduction targets of China’s railway transportation industry. This paper proposes a carbon emission quota model for China’s railway industry from the perspective of equity and efficiency and innovatively undertakes research on the allocation of carbon emission quotas for railway administrations. This paper constructs an econometric model to analyze the impact of various influencing factors on China’s railway operation carbon emission and predicts the total carbon emission of China’s railway operation from 2021 to 2030 by scenario analysis method. From the perspective of equity and efficiency, apply the entropy method to give weight to historical responsibility, egalitarianism, and efficiency principle to obtain the initial allocation value of the carbon emission quota of the operator’s 18 regional railway bureau groups; the ZSG-DEA model is used to obtain the optimal allocation. The results show that railway passenger turnover, freight turnover, vehicle structure, and per capita GDP have a promoting effect on railway carbon emission, and the proportion of clean energy has an inhibitory effect on carbon emission. There is a gap between the distribution results under the single principle and the comprehensive distribution results; the combination of both can more effectively promote the development of the railway industry. From the perspective of equity and efficiency, the carbon emission quota of 18 railway bureau groups in China is high in the east and low in the west. Among them, the Shanghai railway bureau obtains the most carbon emission quota, while the Qinghai–Tibet railway bureau obtains the least carbon emission quota. The research results provide a reference for the railway bureau to coordinate emission reduction and the construction of the railway transport carbon emission market
- …