259 research outputs found

    Genome-Wide Association Study for Plant Height and Grain Yield in Rice under Contrasting Moisture Regimes

    Get PDF
    Drought is one of the vitally critical environmental stresses affecting both growth and yield potential in rice. Drought resistance is a complicated quantitative trait that is regulated by numerous small effect loci and hundreds of genes controlling various morphological and physiological responses to drought. For this study, 270 rice landraces and cultivars were analyzed for their drought resistance. This was done via determination of changes in plant height and grain yield under contrasting water regimes, followed by detailed identification of the underlying genetic architecture via genome-wide association study (GWAS). We controlled population structure by setting top two eigenvectors and combining kinship matrix for GWAS in this study. Eighteen, five, and six associated loci were identified for plant height, grain yield per plant, and drought resistant coefficient, respectively. Nine known functional genes were identified, including five for plant height (OsGA2ox3, OsGH3-2, sd-1, OsGNA1 and OsSAP11/OsDOG), two for grain yield per plant (OsCYP51G3 and OsRRMh) and two for drought resistant coefficient (OsPYL2 and OsGA2ox9), implying very reliable results. A previous study reported OsGNA1 to regulate root development, but this study reports additional controlling of both plant height and root length. Moreover, OsRLK5 is a new drought resistant candidate gene discovered in this study. OsRLK5 mutants showed faster water loss rates in detached leaves. This gene plays an important role in the positive regulation of yield-related traits under drought conditions. We furthermore discovered several new loci contributing to the three investigated traits (plant height, grain yield, and drought resistance). These associated loci and genes significantly improve our knowledge of the genetic control of these traits in rice. In addition, many drought resistant cultivars screened in this study can be used as parental genotypes to improve drought resistance of rice by molecular breeding

    Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge

    Full text link
    This report describes the winning solution to the WeatherProof Dataset Challenge (CVPR 2024 UG2+ Track 3). Details regarding the challenge are available at https://cvpr2024ug2challenge.github.io/track3.html. We propose an enhanced semantic segmentation pipeline for this challenge. Firstly, we improve semantic segmentation models, using backbone pretrained with Depth Anything to improve UperNet model and SETRMLA model, and adding language guidance based on both weather and category information to InternImage model. Secondly, we introduce a new dataset WeatherProofExtra with wider viewing angle and employ data augmentation methods, including adverse weather and super-resolution. Finally, effective training strategies and ensemble method are applied to improve final performance further. Our solution is ranked 1st on the final leaderboard. Code will be available at https://github.com/KaneiGi/WeatherProofChallenge

    Delay-penalized transducer for low-latency streaming ASR

    Full text link
    In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can balance the trade-off between symbol delay and accuracy for streaming models without external alignments. Specifically, our method adds a small constant times (T/2 - t), where T is the number of frames and t is the current frame, to all the non-blank log-probabilities (after normalization) that are fed into the two dimensional transducer recursion. For both streaming Conformer models and unidirectional long short-term memory (LSTM) models, experimental results show that it can significantly reduce the symbol delay with an acceptable performance degradation. Our method achieves similar delay-accuracy trade-off to the previously published FastEmit, but we believe our method is preferable because it has a better justification: it is equivalent to penalizing the average symbol delay. Our work is open-sourced and publicly available (https://github.com/k2-fsa/k2).Comment: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processin

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

    Full text link
    In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Different from other open-sourced datasets that only provide normalized transcriptions, Libriheavy contains richer information such as punctuation, casing and text context, which brings more flexibility for system building. Specifically, we propose a general and efficient pipeline to locate, align and segment the audios in previously published Librilight to its corresponding texts. The same as Librilight, Libriheavy also has three training subsets small, medium, large of the sizes 500h, 5000h, 50000h respectively. We also extract the dev and test evaluation sets from the aligned audios and guarantee there is no overlapping speakers and books in training sets. Baseline systems are built on the popular CTC-Attention and transducer models. Additionally, we open-source our dataset creatation pipeline which can also be used to other audio alignment tasks.Comment: Submitted to ICASSP 202

    Delay-penalized CTC implemented based on Finite State Transducer

    Full text link
    Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher symbol delay. In this work we propose the delay-penalized CTC which is augmented with latency penalty regularization. We devise a flexible and efficient implementation based on the differentiable Finite State Transducer (FST). Specifically, by attaching a binary attribute to CTC topology, we can locate the frames that firstly emit non-blank tokens on the resulting CTC lattice, and add the frame offsets to the log-probabilities. Experimental results demonstrate the effectiveness of our proposed delay-penalized CTC, which is able to balance the delay-accuracy trade-off. Furthermore, combining the delay-penalized transducer enables the CTC model to achieve better performance and lower latency. Our work is open-sourced and publicly available https://github.com/k2-fsa/k2.Comment: Accepted in INTERSPEECH 202

    PromptASR for contextualized ASR with controllable style

    Full text link
    Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and the encodings are injected into the speech encoder by cross-attending the features from two modalities. When using the ground truth text from preceding utterances as content prompt, the proposed system achieves 21.9% and 6.8% relative word error rate reductions on a book reading dataset and an in-house dataset compared to a baseline ASR system. The system can also take word-level biasing lists as prompt to improve recognition accuracy on rare words. An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions. The code is available at icefall.Comment: Submitted to ICASSP202

    Zipformer: A faster and better encoder for automatic speech recognition

    Full text link
    The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at https://github.com/k2-fsa/icefall.Comment: Published as a conference paper at ICLR 202
    corecore