61 research outputs found

    Semantic-Aware Local-Global Vision Transformer

    Full text link
    Vision Transformers have achieved remarkable progresses, among which Swin Transformer has demonstrated the tremendous potential of Transformer for vision tasks. It surmounts the key challenge of high computational complexity by performing local self-attention within shifted windows. In this work we propose the Semantic-Aware Local-Global Vision Transformer (SALG), to further investigate two potential improvements towards Swin Transformer. First, unlike Swin Transformer that performs uniform partition to produce equal size of regular windows for local self-attention, our SALG performs semantic segmentation in an unsupervised way to explore the underlying semantic priors in the image. As a result, each segmented region can correspond to a semantically meaningful part in the image, potentially leading to more effective features within each of segmented regions. Second, instead of only performing local self-attention within local windows as Swin Transformer does, the proposed SALG performs both 1) local intra-region self-attention for learning fine-grained features within each region and 2) global inter-region feature propagation for modeling global dependencies among all regions. Consequently, our model is able to obtain the global view when learning features for each token, which is the essential advantage of Transformer. Owing to the explicit modeling of the semantic priors and the proposed local-global modeling mechanism, our SALG is particularly advantageous for small-scale models when the modeling capacity is not sufficient for other models to learn semantics implicitly. Extensive experiments across various vision tasks demonstrates the merit of our model over other vision Transformers, especially in the small-scale modeling scenarios

    Delay-penalized CTC implemented based on Finite State Transducer

    Full text link
    Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher symbol delay. In this work we propose the delay-penalized CTC which is augmented with latency penalty regularization. We devise a flexible and efficient implementation based on the differentiable Finite State Transducer (FST). Specifically, by attaching a binary attribute to CTC topology, we can locate the frames that firstly emit non-blank tokens on the resulting CTC lattice, and add the frame offsets to the log-probabilities. Experimental results demonstrate the effectiveness of our proposed delay-penalized CTC, which is able to balance the delay-accuracy trade-off. Furthermore, combining the delay-penalized transducer enables the CTC model to achieve better performance and lower latency. Our work is open-sourced and publicly available https://github.com/k2-fsa/k2.Comment: Accepted in INTERSPEECH 202

    Delay-penalized transducer for low-latency streaming ASR

    Full text link
    In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can balance the trade-off between symbol delay and accuracy for streaming models without external alignments. Specifically, our method adds a small constant times (T/2 - t), where T is the number of frames and t is the current frame, to all the non-blank log-probabilities (after normalization) that are fed into the two dimensional transducer recursion. For both streaming Conformer models and unidirectional long short-term memory (LSTM) models, experimental results show that it can significantly reduce the symbol delay with an acceptable performance degradation. Our method achieves similar delay-accuracy trade-off to the previously published FastEmit, but we believe our method is preferable because it has a better justification: it is equivalent to penalizing the average symbol delay. Our work is open-sourced and publicly available (https://github.com/k2-fsa/k2).Comment: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processin

    PromptASR for contextualized ASR with controllable style

    Full text link
    Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and the encodings are injected into the speech encoder by cross-attending the features from two modalities. When using the ground truth text from preceding utterances as content prompt, the proposed system achieves 21.9% and 6.8% relative word error rate reductions on a book reading dataset and an in-house dataset compared to a baseline ASR system. The system can also take word-level biasing lists as prompt to improve recognition accuracy on rare words. An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions. The code is available at icefall.Comment: Submitted to ICASSP202

    Corrosion Fatigue Behavior and Damage Mechanism of the Bridge Cable Structures

    Get PDF
    The long-term performance and corrosion fatigue damage status were investigated and analyzed under the service environment for the cable structures in cable-stayed bridges, suspension bridges, and suspender arch bridges. The artificial accelerated corrosion fatigue tests were carried out on galvanized parallel steel wire under coupled loading and environments. The damage mechanisms of galvanized parallel steel wire in corrosion, stress corrosion, and corrosion fatigue were investigated. The change laws of the mechanical properties of the cable were studied. Based on the image gray analysis, the evaluation method was proposed for the technical status of the damaged cable. Furthermore, combined with the cable damage evolution model, the service life prediction method and assessment technology of cables based on damage safety are established

    Fast and parallel decoding for transducer

    Full text link
    The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast and parallel way due to an unconstrained number of symbols that can be emitted per time step. In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches. Furthermore, we propose an finite state automaton-based (FSA) parallel beam search algorithm that can run with graphs on GPU efficiently. The experiment results show that we achieve slight word error rate (WER) improvement as well as significant speedup in decoding. Our work is open-sourced and publicly available\footnote{https://github.com/k2-fsa/icefall}.Comment: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processin

    Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

    Full text link
    Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch. In this paper, we reformulate the generation of teacher label as a codec problem. We propose a novel Multi-codebook Vector Quantization (MVQ) approach that compresses teacher embeddings to codebook indexes (CI). Based on this, a KD training framework (MVQ-KD) is proposed where a student model predicts the CI generated from the embeddings of a self-supervised pre-trained teacher model. Experiments on the LibriSpeech clean-100 hour show that MVQ-KD framework achieves comparable performance as traditional KD methods (l1, l2), while requiring 256 times less storage. When the full LibriSpeech dataset is used, MVQ-KD framework results in 13.8% and 8.2% relative word error rate reductions (WERRs) for non -streaming transducer on test-clean and test-other and 4.0% and 4.9% for streaming transducer. The implementation of this work is already released as a part of the open-source project icefall.Comment: Submitted to ICASSP 202

    Soil Bacterial Function Associated With Stylo (Legume) and Bahiagrass (Grass) Is Affected More Strongly by Soil Chemical Property Than by Bacterial Community Composition

    Get PDF
    Soil microbes are driver of nutrient cycling, with microbial function affected by community composition and soil chemical property. Legume and grass are ubiquitous in many ecosystems, however, their differential effects on microbial function are less understood. Here we constructed compartmented rhizobox planted with stylo (Stylosanthes guianensis, legume) or bahiagrass (Paspalum natatum, grass) to compare their influences on bacterial function and to investigate the determinant of bacterial function. Soils in root compartment and in near (0–5 mm from root compartment) or far (10–15 mm from root compartment) rhizosphere were sampled. Soil chemical properties, bacterial community composition and function were characterized. Results indicate that plant species and distance significantly affected bacterial function. The activities of beta-xylosidase, nitrate reductase and phosphomonoesterase were higher in stylo soil than in bahiagrass soil, while leucine-aminopeptidase activity and nosZ abundance were vice versa. Rhizosphere effect was obvious for the activities of beta-glucosidase, beta-xylosidase, chitinase, and the abundances of AOB-amoA, nirS, nosZ. Statistical analysis revealed that soil chemical property was significantly associated with bacterial function, with a higher coefficient than bacterial community composition. These data suggest that stylo and bahiagrass differentially affect bacterial function, which is affected more strongly by soil chemical property than by community composition

    Plant Species-Dependent Effects of Liming and Plant Residue Incorporation on Soil Bacterial Community and Activity in an Acidic Orchard Soil

    No full text
    Both liming and plant residue incorporation are widely used practices for the amelioration of acidic soils—however, the difference in their effects is still not fully understood, especially regarding the microbial community. In this study, we took the acidic soils from a subtropical orchard as target soils, and implemented liming and plant residue incorporation with a leguminous and a gramineous cover crop as test plants. After six months of growth, soil pH, total organic carbon (TOC), dissolved organic carbon (DOC) and nutrient contents were determined, soil enzymes involving C, N, P cycling were assayed, and microbial communities were also analyzed using Polymerase Chain Reaction-Denaturing Gradient Gel Electrophoresis (PCR-DGGE). Results showed that liming was more effective in elevating soil pH, while plant residue incorporation exerted a more comprehensive influence—not only on soil pH, but also on soil enzyme activity and microbial community. PCR-DGGE analysis revealed that liming changed the microbial community structure more greatly than plant residue incorporation, while plant residue incorporation altered the microbial community composition much more than liming. The growth responses of test plants to liming and plant residue incorporation depended on plant species, indicating the necessity to select appropriate practice for a particular crop. A further, detailed investigation into the microbial community composition, and the respective functions using metagenomic approach, is also suggested

    Test Study of the Bridge Cable Corrosion Protection Mechanism Based on Impressed Current Cathodic Protection

    No full text
    The cable system is an important bearing element of a bridge with stay cables or slings and a matter of major concern in the safety of the bridge structure. Bridge cables are vulnerable to corrosion induced by leakage and soaking during their service life. To solve this problem, and based on the idea of proactive control by means of the impressed current cathodic protection (ICCP) of bridge cables, this study designs and develops an ICCP system device for bridge cable protection. In this study, an accelerated corrosion test was conducted to test the ICCP system of steel wires inside the cables and the cables under acid rain conditions. The corrosion protection behavior of ICCP was analyzed to reveal the corrosion protection mechanism of bridge cable ICCP. The results show that in the cable ICCP system, the impressed current generated by a more negative voltage may improve the efficiency of corrosion protection, but an excessively negative voltage may cause hydrogen embrittlement of the cable steel wire due to overprotection. The rational range of −1.13 V to −1.15 V was set as the result of the overall consideration. Within this range, the cable is subject to the joint protection of ICCP and sacrificial anode cathodic protection (SACP). Corrosive products can delay the development of cable corrosion to a certain degree; the SACP protection efficiency of the galvanized coat reduces gradually with corrosion development; and cable ICCP protection efficiency increases gradually. The ICCP for cable corrosion protection is transformed from joint protection using both a sacrificial anode and impressed current into protection, mainly using an impressed current
    • …
    corecore