219 research outputs found

    Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

    Full text link
    Transformer-based models have recently made significant achievements in the application of end-to-end (E2E) automatic speech recognition (ASR). It is possible to deploy the E2E ASR system on smart devices with the help of Transformer-based models. While these models still have the disadvantage of requiring a large number of model parameters. To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy. Specifically, we design a novel block-reusing strategy for speech Transformer (BRST) to enhance the effectiveness of parameters and propose an adapter module (ADM) that can produce a compact and adaptable model with only a few additional trainable parameters accompanying each reusing block. We conducted an experiment with the proposed method on the public AISHELL-1 corpus, and the results show that the proposed approach achieves the character error rate (CER) of 9.3%/6.63% with only 7.6M/8.3M parameters without and with the ADM, respectively. In addition, we also make a deeper analysis to show the effect of ADM in the general block-reusing method

    Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering

    Full text link
    As a hot research topic, many multi-view clustering approaches are proposed over the past few years. Nevertheless, most existing algorithms merely take the consensus information among different views into consideration for clustering. Actually, it may hinder the multi-view clustering performance in real-life applications, since different views usually contain diverse statistic properties. To address this problem, we propose a novel Tensor-based Intrinsic Subspace Representation Learning (TISRL) for multi-view clustering in this paper. Concretely, the rank preserving decomposition is proposed firstly to effectively deal with the diverse statistic information contained in different views. Then, to achieve the intrinsic subspace representation, the tensor-singular value decomposition based low-rank tensor constraint is also utilized in our method. It can be seen that specific information contained in different views is fully investigated by the rank preserving decomposition, and the high-order correlations of multi-view data are also mined by the low-rank tensor constraint. The objective function can be optimized by an augmented Lagrangian multiplier based alternating direction minimization algorithm. Experimental results on nine common used real-world multi-view datasets illustrate the superiority of TISRL

    Frame-wise Cross-modal Matching for Video Moment Retrieval

    Full text link
    Video moment retrieval targets at retrieving a moment in a video for a given language query. The challenges of this task include 1) the requirement of localizing the relevant moment in an untrimmed video, and 2) bridging the semantic gap between textual query and video contents. To tackle those problems, early approaches adopt the sliding window or uniform sampling to collect video clips first and then match each clip with the query. Obviously, these strategies are time-consuming and often lead to unsatisfied accuracy in localization due to the unpredictable length of the golden moment. To avoid the limitations, researchers recently attempt to directly predict the relevant moment boundaries without the requirement to generate video clips first. One mainstream approach is to generate a multimodal feature vector for the target query and video frames (e.g., concatenation) and then use a regression approach upon the multimodal feature vector for boundary detection. Although some progress has been achieved by this approach, we argue that those methods have not well captured the cross-modal interactions between the query and video frames. In this paper, we propose an Attentive Cross-modal Relevance Matching (ACRM) model which predicts the temporal boundaries based on an interaction modeling. In addition, an attention module is introduced to assign higher weights to query words with richer semantic cues, which are considered to be more important for finding relevant video contents. Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy. Extensive experiments on two datasets TACoS and Charades-STA demonstrate the superiority of our method over several state-of-the-art methods. Ablation studies have been also conducted to examine the effectiveness of different modules in our ACRM model.Comment: 12 pages; accepted by IEEE TM

    Label-free Node Classification on Graphs with Large Language Models (LLMS)

    Full text link
    In recent years, there have been remarkable advancements in node classification achieved by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels to ensure promising performance. In contrast, Large Language Models (LLMs) exhibit impressive zero-shot proficiency on text-attributed graphs. Yet, they face challenges in efficiently processing structural data and suffer from high inference costs. In light of these observations, this work introduces a label-free node classification on graphs with LLMs pipeline, LLM-GNN. It amalgamates the strengths of both GNNs and LLMs while mitigating their limitations. Specifically, LLMs are leveraged to annotate a small portion of nodes and then GNNs are trained on LLMs' annotations to make predictions for the remaining large portion of nodes. The implementation of LLM-GNN faces a unique challenge: how can we actively select nodes for LLMs to annotate and consequently enhance the GNN training? How can we leverage LLMs to obtain annotations of high quality, representativeness, and diversity, thereby enhancing GNN performance with less cost? To tackle this challenge, we develop an annotation quality heuristic and leverage the confidence scores derived from LLMs to advanced node selection. Comprehensive experimental results validate the effectiveness of LLM-GNN. In particular, LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset \products with a cost less than 1 dollar.Comment: The code will be available soon via https://github.com/CurryTang/LLMGN

    A new Gaussian Process based model for non-linear wave loading on vertical cylinders

    Get PDF
    We aim to establish a fast and accurate model for fast prediction of nonlinear loading on vertical cylinders such as are typically used for fixed offshore wind turbines. We follow a ‘Stokes-type’ force model and approximate the amplitude of the higher harmonics of force by relating these to the linear force time series raised to appropriate power through amplitude and phase coefficients. We reanalyse previous experimental data and perform new experiments to expand the parameter space and establish a force coefficients database for engineering applications. A machine learning model is used to interpolate the database and make predictions on the higher order force coefficients. The machine learning model also provides a cross-validated confidence interval to indicate the prediction uncertainty and reflect model reliability. We further extend the prediction capability to unidirectional random waves with a novel force segmentation method, which localised wave groups from the random background. The new Stokes-Gaussian Process (Stokes-GP) model developed can provide engineering predictions of nonlinear wave loading on a cylinder for individual wave groups and random seas, which are straightforward to apply and fast to compute and the important higher-order loading components are considered. This will significantly improve the accuracy of the loading prediction and the ease of application for force predictions.</p

    Data Informed Model Test Design With Machine Learning – An Example in Nonlinear Wave Load on a Vertical Cylinder

    Get PDF
    Model testing is common in coastal and offshore engineering. The design of such model tests is important such that the maximal information of the underlying physics can be extrapolated with a limited amount of test cases. The design of experiments also requires considering the previous similar experimental results and the typical sea-states of the ocean environments. In this study, we develop a model test design strategy based on Bayesian sampling for a classic problem in ocean engineering—nonlinear wave loading on a vertical cylinder. The new experimental design strategy is achieved through a GP-based surrogate model, which considers the previous experimental data as the prior information. The metocean data are further incorporated into the experimental design through a modified acquisition function. We perform a new experiment, which is mainly designed by data-driven methods, including several critical parameters such as the size of the cylinder and all the wave conditions. We examine the performance of such a method when compared to traditional experimental design based on manual decisions. This method is a step forward to a more systematic way of approaching test designs with marginally better performance in capturing the higher-order force coefficients. The current surrogate model also made several “interpretable” decisions which can be explained with physical insights

    Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?

    Full text link
    Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. However, the analysis of GNN performance with respect to nodes exhibiting different structural patterns, e.g., homophilic nodes in heterophilic graphs, remains rather limited. In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. We theoretically and empirically identify effects of GNNs on testing nodes exhibiting distinct structural patterns. We then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs, revealing reasons for the performance disparity, namely the aggregated feature distance and homophily ratio difference between training and testing nodes. Furthermore, we demonstrate the practical implications of our new findings via (1) elucidating the effectiveness of deeper GNNs; and (2) revealing an over-looked distribution shift factor on graph out-of-distribution problem and proposing a new scenario accordingly.Comment: 54 pages, 24 figure

    An investigation of high-order harmonics in the pressure field around a vertical cylinder in steep wave conditions

    Get PDF
    Offshore structures, encompassing foundations for offshore wind turbines, supports for marine renewable energy devices, bridge piers, and floating vessels, are consistently subjected to severe environmental loads. These loads often dictate the design criteria. Understanding the physics and statistics of wave-structure interaction, especially under non-linear loads experienced in extreme conditions, remains a complex and partially unresolved challenge. Notably, secondary load cycles significantly contribute to the ’ringing’ responses in cylindrical structures, as discussed in previous studies (e.g., Grue et al. (1993), Chaplin et al. (1997)). This paper focuses on analysing loads in focused wave groups, representing short-term extreme wave conditions, on bottom-mounted vertical cylinders relevant to fixed offshore wind turbines. Pressure contour plots over the cylinder’s surface were previously examined by Ghadirian &amp; Bredmose (2020) while studying secondary load cycles. In this research, we adopt the phase-based harmonic separation method for wave forces (Fitzgerald et al. (2014)) to analyse the pressure contour plots. This method effectively isolates harmonic pressure components from the total pressures, enabling a novel exploration of the mechanisms behind secondary load cycles from the perspective of high-order harmonics on the cylinder surface
    corecore