42 research outputs found
Connecting Speech Encoder and Large Language Model for ASR
The impressive capability and versatility of large language models (LLMs)
have aroused increasing attention in automatic speech recognition (ASR), with
several pioneering studies attempting to build integrated ASR models by
connecting a speech encoder with an LLM. This paper presents a comparative
study of three commonly used structures as connectors, including fully
connected layers, multi-head cross-attention, and Q-Former. Speech encoders
from the Whisper model series as well as LLMs from the Vicuna model series with
different model sizes were studied. Experiments were performed on the commonly
used LibriSpeech, Common Voice, and GigaSpeech datasets, where the LLMs with
Q-Formers demonstrated consistent and considerable word error rate (WER)
reductions over LLMs with other connector structures. Q-Former-based LLMs can
generalise well to out-of-domain datasets, where 12% relative WER reductions
over the Whisper baseline ASR model were achieved on the Eval2000 test set
without using any in-domain training data from Switchboard. Moreover, a novel
segment-level Q-Former is proposed to enable LLMs to recognise speech segments
with a duration exceeding the limitation of the encoders, which results in 17%
relative WER reductions over other connector structures on 90-second-long
speech data
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Audio-visual large language models (LLM) have drawn significant attention,
yet the fine-grained combination of both input streams is rather
under-explored, which is challenging but necessary for LLMs to understand
general video inputs. To this end, a fine-grained audio-visual joint
representation (FAVOR) learning framework for multimodal LLMs is proposed in
this paper, which extends a text-based LLM to simultaneously perceive speech
and audio events in the audio input stream and images or videos in the visual
input stream, at the frame level. To fuse the audio and visual feature streams
into joint representations and to align the joint space with the LLM input
embedding space, we propose a causal Q-Former structure with a causal attention
module to enhance the capture of causal relations of the audio-visual frames
across time. An audio-visual evaluation benchmark (AVEB) is also proposed which
comprises six representative single-modal tasks with five cross-modal tasks
reflecting audio-visual co-reasoning abilities. While achieving competitive
single-modal performance on audio, speech and image tasks in AVEB, FAVOR
achieved over 20% accuracy improvements on the video question-answering task
when fine-grained information or temporal causal reasoning is required. FAVOR,
in addition, demonstrated remarkable video comprehension and reasoning
abilities on tasks that are unprecedented by other multimodal LLMs. An
interactive demo of FAVOR is available at
https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and
model checkpoints will be released soon
Improving tree survival prediction with forecast combination and disaggregation.
Abstract: The tree mortality model plays an important role in simulating stand dynamic processes. Past work has shown that the disaggregation method was successful in improving tree survival prediction. This method was used in this study to forecast tree survival probability of Chinese pine (Pinus tabulaeformis CarriĂšre) in Beijing. Outputs from the tree survival model were adjusted from either the stand-level model prediction or the combined estimator from the forecast combination method. Our results show that the disaggregation approach improved the performance of tree survival models. We also showed that stand-level prediction played a crucial role in refining outputs from a tree survival model, especially when it is a very simple model. Because the forecast combination method produced better stand-level prediction, we prefer the use of this method in conjunction with the disaggregation approach, even though the performance gain in using the forecast combination method shown for this data set was modest. RĂ©sumĂ© : La modĂ©lisation de la mortalitĂ© des arbres joue un rĂŽle important dans la simulation des processus dynamiques de la croissance forestiĂšre. Les travaux antĂ©rieurs ont montrĂ© que la mĂ©thode de dĂ©sagrĂ©gation pouvait amĂ©liorer la prĂ©dic-tion de la survie des arbres. Cette mĂ©thode a donc Ă©tĂ© utilisĂ©e ici pour prĂ©dire la probabilitĂ© de survie du pin de Chine (Pinus tabulaeformis CarriĂšre) Ă PĂ©kin. Les extrants du modĂšle de survie des arbres ont Ă©tĂ© ajustĂ©s Ă partir soit de la prĂ©diction du modĂšle Ă l'Ă©chelle du peuplement, soit de l'estimateur combinĂ© de la mĂ©thode de combinaison des prĂ©dictions. Nos rĂ©-sultats montrent que l'approche de dĂ©sagrĂ©gation a amĂ©liorĂ© la performance du modĂšle de survie des arbres. Nous avons Ă©galement montrĂ© que la prĂ©diction Ă l'Ă©chelle du peuplement a jouĂ© un rĂŽle crucial dans le raffinement des extrants du modĂšle de survie des arbres, surtout lorsque le modĂšle est trĂšs simple. Comme la mĂ©thode de combinaison des prĂ©dictions prĂ©-dit le mieux les attributs du peuplement, nous prĂ©fĂ©rons l'utiliser conjointement avec la mĂ©thode de dĂ©sagrĂ©gation, mĂȘme si le gain de performance Ă©tait modeste pour l'ensemble de donnĂ©es considĂ©rĂ©. [Traduit par la RĂ©daction
Reliability and Validity of Simplified Chinese Version of Roland-Morris Questionnaire in Evaluating Rural and Urban Patients with Low Back Pain
OBJECTIVE: The causes of low back pain in China and Western countries are extremely different. We attempted to analyze the risk factors of low back pain in urban and rural patients under the dual economy with the simplified Chinese version of Roland-Morris disability questionnaire (SC-RMDQ) to demonstrate that SC-RMDQ could evaluate patients with low back pain arising from different causes. METHODS: Roland-Morris disability questionnaire was translated into SCRMDQ according to international guidelines for questionnaire adaptation. In this study, causes of low back pain of 187 outpatients and inpatients (99 urban patients and 88 rural patients) were analyzed. All patients underwent simplified Chinese version of Roland-Morris disability questionnaire (SC-RMDQ), simplified Chinese Oswestry disability index (SCODI) and visual analogue scale (VAS). Reliability was tested using reproducibility (intraclass coefficient of correlation--ICC) and internal consistency (Cronbach's alpha). Validity was tested using Pearson correlation analysis. RESULTS: The leading causes for low back pain were sedentariness (38.4%) and vibration (18.1%) in urban patients and waist bending (48.9%) and spraining (25%) in rural patients. Although causes of low back pain in the two groups of population were completely different, SCRMDQ had high internal consistency (Cronbach's α value of 0.874 in urban patients and 0.883 in rural patients) and good reproducibility (ICC value of .952 in urban patients and 0.949 in rural patients, P<0.01). SCRMDQ also showed significant correlation with Simplified Chinese version of Oswestry disability index (SCODI) and visual analogue scale (VAS) in rural areas (SCRMDQ-SCODI râ=â0.841; SCRMDQ-VAS: râ=â0.685, P<0.01) and in urban areas (SCRMDQ-SCODI: râ=â0.818, P<0.01; SCRMDQ-VAS: râ=â0.666, P<0.01). CONCLUSIONS: Although causes of low back pain are completely different in rural and urban patients, SCRMDQ has a good reliability and validity, which is a reliable clinical method to evaluate disability of rural and urban patients
Public involvement in setting a national research agenda
<p>(A) Graphical map of the BLAST results showing nucleotide identity between <i>A</i>. <i>fasciata</i> mitogenome and 15 related species listed in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0136297#pone.0136297.t001" target="_blank">Table 1</a>, as generated by the CGView comparison tool (CCT). CCT arranges BLAST result in an order where sequence that is most similar to the reference (<i>A</i>. <i>fasciata</i>) is placed closer to the outer edge of the map. The rings labelled 1 to17 indicate BLAST results of <i>A</i>. <i>fasciata</i> mitogenome against <i>A</i>. <i>chrysaetos</i>, <i>N</i>. <i>nipalensis</i>, <i>N</i>. <i>alboniger</i>, <i>S</i>. <i>cheela</i>, <i>A</i>. <i>monachus</i>, <i>B</i>. <i>lagopus</i>, <i>B</i>. <i>buteo</i>, <i>B</i>. <i>buteo burmanicus</i>, <i>A</i>. <i>soloensis</i>, <i>A</i>. <i>virgatus</i>, <i>A</i>. <i>gentilis</i>, <i>A</i>. <i>nisus</i>, <i>P</i>. <i>haliaetus</i>, <i>S</i>. <i>serpentarius</i>, <i>C</i>. <i>aura</i>, <i>P</i>. <i>badius</i>, and <i>S</i>. <i>leptogrammica</i>, respectively. (B) Nucleotide-based phylogenetic tree of 16 Accipitriformes species, with two Strigiformes birds as outgroups. This analysis is based on 13PCGs. Both ML and Bayesian analyses produced identical tree topologies. The ML bootstrap and Bayesian posterior probability values for each node are indicated.</p
Robust Noise Suppression Technique for a LADAR System via Eigenvalue-Based Adaptive Filtering
The laser detection and ranging system (LADAR) is widely used in various fields that require 3D measurement, detection, and modeling. In order to improve the system stability and ranging accuracy, it is necessary to obtain the complete waveform of pulses that contain target information. Due to the inevitable noise, there are distinct deviations between the actual and expected waveforms, so noise suppression is essential. To achieve the best effect, the filters’ parameters that are usually set as empirical values should be adaptively adjusted according to the different noise levels. Therefore, we propose a novel noise suppression method for the LADAR system via eigenvalue-based adaptive filtering. Firstly, an efficient noise level estimation method is developed. The distributions of the eigenvalues of the sample covariance matrix are analyzed statistically after one-dimensional echo data are transformed into matrix format. Based on the boundedness and asymptotic properties of the noise eigenvalue spectrum, an estimation method for noise variances in high dimensional settings is proposed. Secondly, based on the estimated noise level, an adaptive guided filtering algorithm is designed within the gradient domain. The optimized parameters of the guided filtering are set according to an estimated noise level. Through simulation analysis and testing experiments on echo waves, it is proven that our algorithm can suppress the noise reliably and has advantages over the existing relevant methods
Evolution of plant Ash1 SET genes: structural divergence and functional differentiation
Plant Ash1 SET proteins are involved in H3K36 methylation, and play a key role in plant reproductive development. Genes encoding Ash1 SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history and functional differentiation of the Ash1 SET gene family, we made a comprehensive evolutionary analysis of this gene family from eleven major representatives of green plants. A novel deep sister relationship grouping previously resolved II-1 and II-2 orthologous groups was identified. The absence of AWS domain in the group II-2 suggests that the independent losses of AWS domain have occurred during evolution. A diversity of gene structures in plant Ash1 SET gene family have been presented since the divergence of Physcomitrella patens (moss) from the other land plants. A small proportion of codons in SET domain regions were detected to be under positive selection along the branches ancestral to land plant and angiosperms, which may have allowed changes of substrate specificity among different evolutionary groups while maintaining the primary function of SET domains. Our predictive subcellular localization and comparative anatomical meta-expression analyses can assort with the structural divergences of Ash1 SET proteins
FEM-Validated Optimal Design of Laminate Process Parameters Based on Improved Genetic Algorithm
In tape placement process, the laying angle and laying sequence of laminates have proven their significant effects on the mechanical properties of carbon fibre reinforced composite material, specifically, laminates. In order to optimise these process parameters, an optimisation algorithm is developed based on the principles of genetic algorithms for improving the precision of traditional genetic algorithms and resolving the premature phenomenon in the optimisation process. Taking multi-layer symmetrically laid carbon fibre laminates as the research object, this algorithm adopts binary coding to conduct the optimisation of process parameters and mechanical analysis with the laying angle as the design variable and the strength ratio R as the response variable. A case study was conducted and its results were validated by the finite element analyses. The results show that the stresses before and after optimisation are 116.0 MPa and 100.9 MPa, respectively, with a decrease of strength ratio by 13.02%. The results comparison indicates that, in the iterative process, the search range is reduced by determining the code and location of important genes, thereby reducing the computational workload by 21.03% in terms of time consumed. Through multiple calculations, it validates that “gene mutation” is an indispensable part of the genetic algorithm in the iterative process
FEM-Validated Optimal Design of Laminate Process Parameters Based on Improved Genetic Algorithm
In tape placement process, the laying angle and laying sequence of laminates have proven their significant effects on the mechanical properties of carbon fibre reinforced composite material, specifically, laminates. In order to optimise these process parameters, an optimisation algorithm is developed based on the principles of genetic algorithms for improving the precision of traditional genetic algorithms and resolving the premature phenomenon in the optimisation process. Taking multi-layer symmetrically laid carbon fibre laminates as the research object, this algorithm adopts binary coding to conduct the optimisation of process parameters and mechanical analysis with the laying angle as the design variable and the strength ratio R as the response variable. A case study was conducted and its results were validated by the finite element analyses. The results show that the stresses before and after optimisation are 116.0 MPa and 100.9 MPa, respectively, with a decrease of strength ratio by 13.02%. The results comparison indicates that, in the iterative process, the search range is reduced by determining the code and location of important genes, thereby reducing the computational workload by 21.03% in terms of time consumed. Through multiple calculations, it validates that âgene mutationâ is an indispensable part of the genetic algorithm in the iterative process
Differences in serum cytokine levels distinguish between clinically nonâinvasive lung adenocarcinoma and invasive lung adenocarcinoma: A crossâsectional study
Abstract Background Lung cancer incidence and mortality remain high and are now the leading cause of cancerârelated death. Lung adenocarcinoma (LUAD) is one of the main histological subtypes of lung cancer. Previous studies have shown the role of inflammation in the development of lung cancer, but the relationship between cytokines and LUAD is still unclear. To further differentiate and explore the association of cytokines with the risk of nonâinvasive and invasive LUAD, we studied and assessed serum cytokine levels in patients with two types of LUAD. Methods A cohort study of 90 nonâinvasive LUAD and 90 invasive LUAD was retrospectively included, and the clinical characteristics were recorded in detail. The differences in the levels of 12 serum cytokines (IFNâα, IFNâÎł, ILâ10, ILâ12P70, ILâ17A, ILâ1ÎČ, ILâ2, ILâ4, ILâ5, ILâ6, ILâ8, and TNFâα) between the two groups of patients with LUAD were analyzed and evaluated. And we evaluated the clinical value of cytokine differential diagnosis of invasive LUAD based on receiver operating characteristics (ROC) curves. Results The mean age of the patients was 56.6 years, and the proportions of males and females were 38.9% and 61.1%, respectively. IFNâα, ILâ1ÎČ, ILâ2, ILâ6, TNFâα, ILâ4, and ILâ8 were significantly increased in patients with invasive LUAD compared with the nonâinvasive LUAD group. Further research found that smoking is an important factor, with changes in the four cytokines ILâ1ÎČ, ILâ6, ILâ8, and TNFâα being significantly higher in the smoking group of patients with invasive LUAD. It can be seen from the area under the curve that ILâ1ÎČ and ILâ2 have a significant differential diagnosis. Conclusions We observed differences in preoperative serum cytokine levels between patients with invasive and nonâinvasive LUAD, which may serve as potential serum biomarkers for clinical differential diagnosis and disease progression assessment