968 research outputs found

    A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

    Full text link
    Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios. However, existing approaches (1) focus on identifying implicit correlations between utterances and one-hot encoded labels in both tasks while ignoring explicit label characteristics; (2) directly incorporate multi-intent information for each token, which could lead to incorrect slot prediction due to the introduction of irrelevant intent. In this paper, we propose a framework termed DGIF, which first leverages the semantic information of labels to give the model additional signals and enriched priors. Then, a multi-grain interactive graph is constructed to model correlations between intents and slots. Specifically, we propose a novel approach to construct the interactive graph based on the injection of label semantics, which can automatically update the graph to better alleviate error propagation. Experimental results show that our framework significantly outperforms existing approaches, obtaining a relative improvement of 13.7% over the previous best model on the MixATIS dataset in overall accuracy.Comment: Submitted to ICASSP 202

    Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

    Full text link
    Visual grounding (VG) aims to establish fine-grained alignment between vision and language. Ideally, it can be a testbed for vision-and-language models to evaluate their understanding of the images and texts and their reasoning abilities over their joint space. However, most existing VG datasets are constructed using simple description texts, which do not require sufficient reasoning over the images and texts. This has been demonstrated in a recent study~\cite{luo2022goes}, where a simple LSTM-based text encoder without pretraining can achieve state-of-the-art performance on mainstream VG datasets. Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, we propose two approaches to accept the triple-type input, where the former embeds knowledge into the image features before the image-query interaction; the latter leverages linguistic structure to assist in computing the image-text matching. We conduct extensive experiments to analyze the above methods and show that the proposed approaches achieve promising results but still leave room for improvement, including performance and interpretability. The dataset and code are available at \url{https://github.com/zhjohnchan/SK-VG}.Comment: Computer Vision and Natural Language Processing. 21 pages, 14 figures. CVPR-202

    Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

    Full text link
    Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities. In this paper, we do an investigation of efficient tuning problems on referring image segmentation. We propose a novel adapter called Bridger to facilitate cross-modal information exchange and inject task-specific information into the pre-trained model. We also design a lightweight decoder for image segmentation. Our approach achieves comparable or superior performance with only 1.61\% to 3.38\% backbone parameter updates, evaluated on challenging benchmarks. The code is available at \url{https://github.com/kkakkkka/ETRIS}.Comment: Computer Vision and Natural Language Processing. 14 pages, 8 figures. ICCV-202

    A Markov Model on China’s Export Cycles: Regime Division and Regime Switching

    Get PDF
    Based on Hamilton’ Markov regime-switching model applied to postwar U.S. business cycle, the paper uses Chinese export data from January 1999 to November 2010 to describe and investigate the dynamic growth path of China’s export cycles. The empirical results show that the growth path of China’s export can be classified as long-term expansion regime and short-term recession regime, which means the growth path of China’s export may experience a shift of regime. The global economic situation, especially the 1997 Asian financial crisis and 2008 global financial crisis, and Chinese macroeconomic policy during these periods may explain the move of regime. Chinese economy needs to shift from export-oriented economic growth to more reliance on indigenous innovation of firms and domestic demand-pulled growth, which maybe not only the result of the 2008 global financial crisis, but also reflects the need for continual growth of Chinese economy in the future.Key words: Export cycles; Markov regime-switching model; Smoothing probabilitiesRésumé: Basé sur Hamilton à changement de modèle du régime Markov appliqué à la conjoncture d'après-guerre américain, l'étude utilise les données des exportations 1chinoises de Janvier 1999 à Novembre 2010 à décrire et à enquêter sur le chemin de la croissance dynamique des cycles de l'exportation de la Chine. Les résultats empiriques montrent que le chemin de la croissance des exportations chinoises peuvent être classés comme régime de l'expansion à long terme et le régime de la récession à court terme, ce qui signifie que le chemin de la croissance des exportations de la Chine, peuvent constater un changement de régime. La situation économique mondiale, en particulier la crise financière asiatique 1997 et 2008 la crise financière mondiale, et les Chinois de la politique macroéconomique durant ces périodes peut expliquer le passage du régime. L'économie chinoise a besoin de passer d'exportation de la croissance économique à s'appuyer davantage sur l'innovation des entreprises et des indigènes par la demande intérieure tirée de croissance, qui peut-être pas seulement le résultat de l'exercice 2008 la crise financière mondiale, mais reflète également la nécessité d'une croissance continue de l'économie chinoise dans le futur.Mots clés: Les Cycles d'exportation; Le Changement de modèle du régime Markov; Les probabilités de lissag

    Visualizing choriocapillaris using swept source optical coherence tomography angiography with various probe beam sizes

    Get PDF
    Imaging choriocapillaris (CC) is a long-term challenge for commercial OCT angiography (OCTA) systems due to limited transverse resolution. Effects of transverse resolution on the visualization of a CC microvascular network are explored and demonstrated in this paper. We use three probe beams with sizes of ~1.12 mm, ~2.51 mm and ~3.50 mm at the pupil plane, which deliver an estimated transverse resolution at the retina of 17.5 µm, 8.8 µm and 7.0 µm, respectively, to investigate the ability of OCTA to resolve the CC capillary vessels. The complex optical microangiography algorithm is applied to extract blood flow in the CC slab. Mean retinal pigment epithelium (RPE) to CC (RPE-CC) distance, mean CC inter-vascular spacing and the magnitude in the radially-averaged power spectrum are quantified. We demonstrate that a clearer CC lobular capillary network is resolved in the angiograms provided by a larger beam size. The image contrast of the CC angiogram with a large beam size of 3.50 mm is 114% higher than that with a small beam size of 1.12 mm. While the measurements of the mean RPE-CC distance and CC inter-vascular spacing are almost consistent regardless of the beam sizes, they are more reliable and stable with the larger beam size of 3.50 mm. We conclude that the beam size is a key parameter for CC angiography if the purpose of the investigation is to visualize the individual CC capillaries.</p

    Mitochondrial genomes of two Barklice, Psococerastis albimaculata and Longivalvus hyalospilus (Psocoptera: Psocomorpha): contrasting rates in mitochondrial gene rearrangement between major lineages of Psocodea

    Get PDF
    The superorder Psocodea has ∼10,000 described species in two orders: Psocoptera (barklice and booklice) and Phthiraptera (parasitic lice). One booklouse, Liposcelis bostrychophila and six species of parasitic lice have been sequenced for complete mitochondrial (mt) genomes; these seven species have the most rearranged mt genomes seen in insects. The mt genome of a barklouse, lepidopsocid sp., has also been sequenced and is much less rearranged than those of the booklouse and the parasitic lice. To further understand mt gene rearrangements in the Psocodea, we sequenced the mt genomes of two barklice, Psococerastis albimaculata and Longivalvus hyalospilus, the first representatives from the suborder Psocomorpha, which is the most species-rich suborder of the Psocodea. We found that these two barklice have the least rearranged mt genomes seen in the Psocodea to date: a protein-coding gene (nad3) and five tRNAs (trnN, trnS1, trnE, trnM and trnC) have translocated. Rearrangements of mt genes in these two barklice can be accounted for by two events of tandem duplication followed by random deletions. Phylogenetic analyses of the mt genome sequences support the view that Psocoptera is paraphyletic whereas Phthiraptera is monophyletic. The booklouse, L. bostrychophila (suborder Troctomorpha) is most closely related to the parasitic lice. The barklice (suborders Trogiomorpha and Psocomorpha) are closely related and form a monophyletic group. We conclude that mt gene rearrangement has been substantially faster in the lineage leading to the booklice and the parasitic lice than in the lineage leading to the barklice. Lifestyle change appears to be associated with the contrasting rates in mt gene rearrangements between the two lineages of the Psocodea
    corecore