42 research outputs found

    Connecting Speech Encoder and Large Language Model for ASR

    Full text link
    The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR models by connecting a speech encoder with an LLM. This paper presents a comparative study of three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, and Q-Former. Speech encoders from the Whisper model series as well as LLMs from the Vicuna model series with different model sizes were studied. Experiments were performed on the commonly used LibriSpeech, Common Voice, and GigaSpeech datasets, where the LLMs with Q-Formers demonstrated consistent and considerable word error rate (WER) reductions over LLMs with other connector structures. Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard. Moreover, a novel segment-level Q-Former is proposed to enable LLMs to recognise speech segments with a duration exceeding the limitation of the encoders, which results in 17% relative WER reductions over other connector structures on 90-second-long speech data

    Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

    Full text link
    Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs. To this end, a fine-grained audio-visual joint representation (FAVOR) learning framework for multimodal LLMs is proposed in this paper, which extends a text-based LLM to simultaneously perceive speech and audio events in the audio input stream and images or videos in the visual input stream, at the frame level. To fuse the audio and visual feature streams into joint representations and to align the joint space with the LLM input embedding space, we propose a causal Q-Former structure with a causal attention module to enhance the capture of causal relations of the audio-visual frames across time. An audio-visual evaluation benchmark (AVEB) is also proposed which comprises six representative single-modal tasks with five cross-modal tasks reflecting audio-visual co-reasoning abilities. While achieving competitive single-modal performance on audio, speech and image tasks in AVEB, FAVOR achieved over 20% accuracy improvements on the video question-answering task when fine-grained information or temporal causal reasoning is required. FAVOR, in addition, demonstrated remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other multimodal LLMs. An interactive demo of FAVOR is available at https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and model checkpoints will be released soon

    Improving tree survival prediction with forecast combination and disaggregation.

    Get PDF
    Abstract: The tree mortality model plays an important role in simulating stand dynamic processes. Past work has shown that the disaggregation method was successful in improving tree survival prediction. This method was used in this study to forecast tree survival probability of Chinese pine (Pinus tabulaeformis CarriĂšre) in Beijing. Outputs from the tree survival model were adjusted from either the stand-level model prediction or the combined estimator from the forecast combination method. Our results show that the disaggregation approach improved the performance of tree survival models. We also showed that stand-level prediction played a crucial role in refining outputs from a tree survival model, especially when it is a very simple model. Because the forecast combination method produced better stand-level prediction, we prefer the use of this method in conjunction with the disaggregation approach, even though the performance gain in using the forecast combination method shown for this data set was modest. RĂ©sumĂ© : La modĂ©lisation de la mortalitĂ© des arbres joue un rĂŽle important dans la simulation des processus dynamiques de la croissance forestiĂšre. Les travaux antĂ©rieurs ont montrĂ© que la mĂ©thode de dĂ©sagrĂ©gation pouvait amĂ©liorer la prĂ©dic-tion de la survie des arbres. Cette mĂ©thode a donc Ă©tĂ© utilisĂ©e ici pour prĂ©dire la probabilitĂ© de survie du pin de Chine (Pinus tabulaeformis CarriĂšre) Ă  PĂ©kin. Les extrants du modĂšle de survie des arbres ont Ă©tĂ© ajustĂ©s Ă  partir soit de la prĂ©diction du modĂšle Ă  l'Ă©chelle du peuplement, soit de l'estimateur combinĂ© de la mĂ©thode de combinaison des prĂ©dictions. Nos rĂ©-sultats montrent que l'approche de dĂ©sagrĂ©gation a amĂ©liorĂ© la performance du modĂšle de survie des arbres. Nous avons Ă©galement montrĂ© que la prĂ©diction Ă  l'Ă©chelle du peuplement a jouĂ© un rĂŽle crucial dans le raffinement des extrants du modĂšle de survie des arbres, surtout lorsque le modĂšle est trĂšs simple. Comme la mĂ©thode de combinaison des prĂ©dictions prĂ©-dit le mieux les attributs du peuplement, nous prĂ©fĂ©rons l'utiliser conjointement avec la mĂ©thode de dĂ©sagrĂ©gation, mĂȘme si le gain de performance Ă©tait modeste pour l'ensemble de donnĂ©es considĂ©rĂ©. [Traduit par la RĂ©daction

    Reliability and Validity of Simplified Chinese Version of Roland-Morris Questionnaire in Evaluating Rural and Urban Patients with Low Back Pain

    Get PDF
    OBJECTIVE: The causes of low back pain in China and Western countries are extremely different. We attempted to analyze the risk factors of low back pain in urban and rural patients under the dual economy with the simplified Chinese version of Roland-Morris disability questionnaire (SC-RMDQ) to demonstrate that SC-RMDQ could evaluate patients with low back pain arising from different causes. METHODS: Roland-Morris disability questionnaire was translated into SCRMDQ according to international guidelines for questionnaire adaptation. In this study, causes of low back pain of 187 outpatients and inpatients (99 urban patients and 88 rural patients) were analyzed. All patients underwent simplified Chinese version of Roland-Morris disability questionnaire (SC-RMDQ), simplified Chinese Oswestry disability index (SCODI) and visual analogue scale (VAS). Reliability was tested using reproducibility (intraclass coefficient of correlation--ICC) and internal consistency (Cronbach's alpha). Validity was tested using Pearson correlation analysis. RESULTS: The leading causes for low back pain were sedentariness (38.4%) and vibration (18.1%) in urban patients and waist bending (48.9%) and spraining (25%) in rural patients. Although causes of low back pain in the two groups of population were completely different, SCRMDQ had high internal consistency (Cronbach's α value of 0.874 in urban patients and 0.883 in rural patients) and good reproducibility (ICC value of .952 in urban patients and 0.949 in rural patients, P<0.01). SCRMDQ also showed significant correlation with Simplified Chinese version of Oswestry disability index (SCODI) and visual analogue scale (VAS) in rural areas (SCRMDQ-SCODI r = 0.841; SCRMDQ-VAS: r = 0.685, P<0.01) and in urban areas (SCRMDQ-SCODI: r = 0.818, P<0.01; SCRMDQ-VAS: r = 0.666, P<0.01). CONCLUSIONS: Although causes of low back pain are completely different in rural and urban patients, SCRMDQ has a good reliability and validity, which is a reliable clinical method to evaluate disability of rural and urban patients

    Public involvement in setting a national research agenda

    Get PDF
    <p>(A) Graphical map of the BLAST results showing nucleotide identity between <i>A</i>. <i>fasciata</i> mitogenome and 15 related species listed in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0136297#pone.0136297.t001" target="_blank">Table 1</a>, as generated by the CGView comparison tool (CCT). CCT arranges BLAST result in an order where sequence that is most similar to the reference (<i>A</i>. <i>fasciata</i>) is placed closer to the outer edge of the map. The rings labelled 1 to17 indicate BLAST results of <i>A</i>. <i>fasciata</i> mitogenome against <i>A</i>. <i>chrysaetos</i>, <i>N</i>. <i>nipalensis</i>, <i>N</i>. <i>alboniger</i>, <i>S</i>. <i>cheela</i>, <i>A</i>. <i>monachus</i>, <i>B</i>. <i>lagopus</i>, <i>B</i>. <i>buteo</i>, <i>B</i>. <i>buteo burmanicus</i>, <i>A</i>. <i>soloensis</i>, <i>A</i>. <i>virgatus</i>, <i>A</i>. <i>gentilis</i>, <i>A</i>. <i>nisus</i>, <i>P</i>. <i>haliaetus</i>, <i>S</i>. <i>serpentarius</i>, <i>C</i>. <i>aura</i>, <i>P</i>. <i>badius</i>, and <i>S</i>. <i>leptogrammica</i>, respectively. (B) Nucleotide-based phylogenetic tree of 16 Accipitriformes species, with two Strigiformes birds as outgroups. This analysis is based on 13PCGs. Both ML and Bayesian analyses produced identical tree topologies. The ML bootstrap and Bayesian posterior probability values for each node are indicated.</p

    Robust Noise Suppression Technique for a LADAR System via Eigenvalue-Based Adaptive Filtering

    No full text
    The laser detection and ranging system (LADAR) is widely used in various fields that require 3D measurement, detection, and modeling. In order to improve the system stability and ranging accuracy, it is necessary to obtain the complete waveform of pulses that contain target information. Due to the inevitable noise, there are distinct deviations between the actual and expected waveforms, so noise suppression is essential. To achieve the best effect, the filters&#8217; parameters that are usually set as empirical values should be adaptively adjusted according to the different noise levels. Therefore, we propose a novel noise suppression method for the LADAR system via eigenvalue-based adaptive filtering. Firstly, an efficient noise level estimation method is developed. The distributions of the eigenvalues of the sample covariance matrix are analyzed statistically after one-dimensional echo data are transformed into matrix format. Based on the boundedness and asymptotic properties of the noise eigenvalue spectrum, an estimation method for noise variances in high dimensional settings is proposed. Secondly, based on the estimated noise level, an adaptive guided filtering algorithm is designed within the gradient domain. The optimized parameters of the guided filtering are set according to an estimated noise level. Through simulation analysis and testing experiments on echo waves, it is proven that our algorithm can suppress the noise reliably and has advantages over the existing relevant methods

    Evolution of plant Ash1 SET genes: structural divergence and functional differentiation

    No full text
    Plant Ash1 SET proteins are involved in H3K36 methylation, and play a key role in plant reproductive development. Genes encoding Ash1 SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history and functional differentiation of the Ash1 SET gene family, we made a comprehensive evolutionary analysis of this gene family from eleven major representatives of green plants. A novel deep sister relationship grouping previously resolved II-1 and II-2 orthologous groups was identified. The absence of AWS domain in the group II-2 suggests that the independent losses of AWS domain have occurred during evolution. A diversity of gene structures in plant Ash1 SET gene family have been presented since the divergence of Physcomitrella patens (moss) from the other land plants. A small proportion of codons in SET domain regions were detected to be under positive selection along the branches ancestral to land plant and angiosperms, which may have allowed changes of substrate specificity among different evolutionary groups while maintaining the primary function of SET domains. Our predictive subcellular localization and comparative anatomical meta-expression analyses can assort with the structural divergences of Ash1 SET proteins

    FEM-Validated Optimal Design of Laminate Process Parameters Based on Improved Genetic Algorithm

    No full text
    In tape placement process, the laying angle and laying sequence of laminates have proven their significant effects on the mechanical properties of carbon fibre reinforced composite material, specifically, laminates. In order to optimise these process parameters, an optimisation algorithm is developed based on the principles of genetic algorithms for improving the precision of traditional genetic algorithms and resolving the premature phenomenon in the optimisation process. Taking multi-layer symmetrically laid carbon fibre laminates as the research object, this algorithm adopts binary coding to conduct the optimisation of process parameters and mechanical analysis with the laying angle as the design variable and the strength ratio R as the response variable. A case study was conducted and its results were validated by the finite element analyses. The results show that the stresses before and after optimisation are 116.0 MPa and 100.9 MPa, respectively, with a decrease of strength ratio by 13.02%. The results comparison indicates that, in the iterative process, the search range is reduced by determining the code and location of important genes, thereby reducing the computational workload by 21.03% in terms of time consumed. Through multiple calculations, it validates that &ldquo;gene mutation&rdquo; is an indispensable part of the genetic algorithm in the iterative process

    FEM-Validated Optimal Design of Laminate Process Parameters Based on Improved Genetic Algorithm

    No full text
    In tape placement process, the laying angle and laying sequence of laminates have proven their significant effects on the mechanical properties of carbon fibre reinforced composite material, specifically, laminates. In order to optimise these process parameters, an optimisation algorithm is developed based on the principles of genetic algorithms for improving the precision of traditional genetic algorithms and resolving the premature phenomenon in the optimisation process. Taking multi-layer symmetrically laid carbon fibre laminates as the research object, this algorithm adopts binary coding to conduct the optimisation of process parameters and mechanical analysis with the laying angle as the design variable and the strength ratio R as the response variable. A case study was conducted and its results were validated by the finite element analyses. The results show that the stresses before and after optimisation are 116.0 MPa and 100.9 MPa, respectively, with a decrease of strength ratio by 13.02%. The results comparison indicates that, in the iterative process, the search range is reduced by determining the code and location of important genes, thereby reducing the computational workload by 21.03% in terms of time consumed. Through multiple calculations, it validates that “gene mutation” is an indispensable part of the genetic algorithm in the iterative process

    Differences in serum cytokine levels distinguish between clinically non‐invasive lung adenocarcinoma and invasive lung adenocarcinoma: A cross‐sectional study

    No full text
    Abstract Background Lung cancer incidence and mortality remain high and are now the leading cause of cancer‐related death. Lung adenocarcinoma (LUAD) is one of the main histological subtypes of lung cancer. Previous studies have shown the role of inflammation in the development of lung cancer, but the relationship between cytokines and LUAD is still unclear. To further differentiate and explore the association of cytokines with the risk of non‐invasive and invasive LUAD, we studied and assessed serum cytokine levels in patients with two types of LUAD. Methods A cohort study of 90 non‐invasive LUAD and 90 invasive LUAD was retrospectively included, and the clinical characteristics were recorded in detail. The differences in the levels of 12 serum cytokines (IFN‐α, IFN‐γ, IL‐10, IL‐12P70, IL‐17A, IL‐1ÎČ, IL‐2, IL‐4, IL‐5, IL‐6, IL‐8, and TNF‐α) between the two groups of patients with LUAD were analyzed and evaluated. And we evaluated the clinical value of cytokine differential diagnosis of invasive LUAD based on receiver operating characteristics (ROC) curves. Results The mean age of the patients was 56.6 years, and the proportions of males and females were 38.9% and 61.1%, respectively. IFN‐α, IL‐1ÎČ, IL‐2, IL‐6, TNF‐α, IL‐4, and IL‐8 were significantly increased in patients with invasive LUAD compared with the non‐invasive LUAD group. Further research found that smoking is an important factor, with changes in the four cytokines IL‐1ÎČ, IL‐6, IL‐8, and TNF‐α being significantly higher in the smoking group of patients with invasive LUAD. It can be seen from the area under the curve that IL‐1ÎČ and IL‐2 have a significant differential diagnosis. Conclusions We observed differences in preoperative serum cytokine levels between patients with invasive and non‐invasive LUAD, which may serve as potential serum biomarkers for clinical differential diagnosis and disease progression assessment
    corecore