219 research outputs found
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Transformer-based models have recently made significant achievements in the
application of end-to-end (E2E) automatic speech recognition (ASR). It is
possible to deploy the E2E ASR system on smart devices with the help of
Transformer-based models. While these models still have the disadvantage of
requiring a large number of model parameters. To overcome the drawback of
universal Transformer models for the application of ASR on edge devices, we
propose a solution that can reuse the block in Transformer models for the
occasion of the small footprint ASR system, which meets the objective of
accommodating resource limitations without compromising recognition accuracy.
Specifically, we design a novel block-reusing strategy for speech Transformer
(BRST) to enhance the effectiveness of parameters and propose an adapter module
(ADM) that can produce a compact and adaptable model with only a few additional
trainable parameters accompanying each reusing block. We conducted an
experiment with the proposed method on the public AISHELL-1 corpus, and the
results show that the proposed approach achieves the character error rate (CER)
of 9.3%/6.63% with only 7.6M/8.3M parameters without and with the ADM,
respectively. In addition, we also make a deeper analysis to show the effect of
ADM in the general block-reusing method
Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering
As a hot research topic, many multi-view clustering approaches are proposed
over the past few years. Nevertheless, most existing algorithms merely take the
consensus information among different views into consideration for clustering.
Actually, it may hinder the multi-view clustering performance in real-life
applications, since different views usually contain diverse statistic
properties. To address this problem, we propose a novel Tensor-based Intrinsic
Subspace Representation Learning (TISRL) for multi-view clustering in this
paper. Concretely, the rank preserving decomposition is proposed firstly to
effectively deal with the diverse statistic information contained in different
views. Then, to achieve the intrinsic subspace representation, the
tensor-singular value decomposition based low-rank tensor constraint is also
utilized in our method. It can be seen that specific information contained in
different views is fully investigated by the rank preserving decomposition, and
the high-order correlations of multi-view data are also mined by the low-rank
tensor constraint. The objective function can be optimized by an augmented
Lagrangian multiplier based alternating direction minimization algorithm.
Experimental results on nine common used real-world multi-view datasets
illustrate the superiority of TISRL
Frame-wise Cross-modal Matching for Video Moment Retrieval
Video moment retrieval targets at retrieving a moment in a video for a given
language query. The challenges of this task include 1) the requirement of
localizing the relevant moment in an untrimmed video, and 2) bridging the
semantic gap between textual query and video contents. To tackle those
problems, early approaches adopt the sliding window or uniform sampling to
collect video clips first and then match each clip with the query. Obviously,
these strategies are time-consuming and often lead to unsatisfied accuracy in
localization due to the unpredictable length of the golden moment. To avoid the
limitations, researchers recently attempt to directly predict the relevant
moment boundaries without the requirement to generate video clips first. One
mainstream approach is to generate a multimodal feature vector for the target
query and video frames (e.g., concatenation) and then use a regression approach
upon the multimodal feature vector for boundary detection. Although some
progress has been achieved by this approach, we argue that those methods have
not well captured the cross-modal interactions between the query and video
frames.
In this paper, we propose an Attentive Cross-modal Relevance Matching (ACRM)
model which predicts the temporal boundaries based on an interaction modeling.
In addition, an attention module is introduced to assign higher weights to
query words with richer semantic cues, which are considered to be more
important for finding relevant video contents. Another contribution is that we
propose an additional predictor to utilize the internal frames in the model
training to improve the localization accuracy. Extensive experiments on two
datasets TACoS and Charades-STA demonstrate the superiority of our method over
several state-of-the-art methods. Ablation studies have been also conducted to
examine the effectiveness of different modules in our ACRM model.Comment: 12 pages; accepted by IEEE TM
Label-free Node Classification on Graphs with Large Language Models (LLMS)
In recent years, there have been remarkable advancements in node
classification achieved by Graph Neural Networks (GNNs). However, they
necessitate abundant high-quality labels to ensure promising performance. In
contrast, Large Language Models (LLMs) exhibit impressive zero-shot proficiency
on text-attributed graphs. Yet, they face challenges in efficiently processing
structural data and suffer from high inference costs. In light of these
observations, this work introduces a label-free node classification on graphs
with LLMs pipeline, LLM-GNN. It amalgamates the strengths of both GNNs and LLMs
while mitigating their limitations. Specifically, LLMs are leveraged to
annotate a small portion of nodes and then GNNs are trained on LLMs'
annotations to make predictions for the remaining large portion of nodes. The
implementation of LLM-GNN faces a unique challenge: how can we actively select
nodes for LLMs to annotate and consequently enhance the GNN training? How can
we leverage LLMs to obtain annotations of high quality, representativeness, and
diversity, thereby enhancing GNN performance with less cost? To tackle this
challenge, we develop an annotation quality heuristic and leverage the
confidence scores derived from LLMs to advanced node selection. Comprehensive
experimental results validate the effectiveness of LLM-GNN. In particular,
LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset \products with
a cost less than 1 dollar.Comment: The code will be available soon via
https://github.com/CurryTang/LLMGN
A new Gaussian Process based model for non-linear wave loading on vertical cylinders
We aim to establish a fast and accurate model for fast prediction of nonlinear loading on vertical cylinders such as are typically used for fixed offshore wind turbines. We follow a ‘Stokes-type’ force model and approximate the amplitude of the higher harmonics of force by relating these to the linear force time series raised to appropriate power through amplitude and phase coefficients. We reanalyse previous experimental data and perform new experiments to expand the parameter space and establish a force coefficients database for engineering applications. A machine learning model is used to interpolate the database and make predictions on the higher order force coefficients. The machine learning model also provides a cross-validated confidence interval to indicate the prediction uncertainty and reflect model reliability. We further extend the prediction capability to unidirectional random waves with a novel force segmentation method, which localised wave groups from the random background. The new Stokes-Gaussian Process (Stokes-GP) model developed can provide engineering predictions of nonlinear wave loading on a cylinder for individual wave groups and random seas, which are straightforward to apply and fast to compute and the important higher-order loading components are considered. This will significantly improve the accuracy of the loading prediction and the ease of application for force predictions.</p
Data Informed Model Test Design With Machine Learning – An Example in Nonlinear Wave Load on a Vertical Cylinder
Model testing is common in coastal and offshore engineering. The design of such model tests is important such that the maximal information of the underlying physics can be extrapolated with a limited amount of test cases. The design of experiments also requires considering the previous similar experimental results and the typical sea-states of the ocean environments. In this study, we develop a model test design strategy based on Bayesian sampling for a classic problem in ocean engineering—nonlinear wave loading on a vertical cylinder. The new experimental design strategy is achieved through a GP-based surrogate model, which considers the previous experimental data as the prior information. The metocean data are further incorporated into the experimental design through a modified acquisition function. We perform a new experiment, which is mainly designed by data-driven methods, including several critical parameters such as the size of the cylinder and all the wave conditions. We examine the performance of such a method when compared to traditional experimental design based on manual decisions. This method is a step forward to a more systematic way of approaching test designs with marginally better performance in capturing the higher-order force coefficients. The current surrogate model also made several “interpretable” decisions which can be explained with physical insights
Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?
Recent studies on Graph Neural Networks(GNNs) provide both empirical and
theoretical evidence supporting their effectiveness in capturing structural
patterns on both homophilic and certain heterophilic graphs. Notably, most
real-world homophilic and heterophilic graphs are comprised of a mixture of
nodes in both homophilic and heterophilic structural patterns, exhibiting a
structural disparity. However, the analysis of GNN performance with respect to
nodes exhibiting different structural patterns, e.g., homophilic nodes in
heterophilic graphs, remains rather limited. In the present study, we provide
evidence that Graph Neural Networks(GNNs) on node classification typically
perform admirably on homophilic nodes within homophilic graphs and heterophilic
nodes within heterophilic graphs while struggling on the opposite node set,
exhibiting a performance disparity. We theoretically and empirically identify
effects of GNNs on testing nodes exhibiting distinct structural patterns. We
then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs,
revealing reasons for the performance disparity, namely the aggregated feature
distance and homophily ratio difference between training and testing nodes.
Furthermore, we demonstrate the practical implications of our new findings via
(1) elucidating the effectiveness of deeper GNNs; and (2) revealing an
over-looked distribution shift factor on graph out-of-distribution problem and
proposing a new scenario accordingly.Comment: 54 pages, 24 figure
An investigation of high-order harmonics in the pressure field around a vertical cylinder in steep wave conditions
Offshore structures, encompassing foundations for offshore wind turbines, supports for marine renewable energy devices, bridge piers, and floating vessels, are consistently subjected to severe environmental loads. These loads often dictate the design criteria. Understanding the physics and statistics of wave-structure interaction, especially under non-linear loads experienced in extreme conditions, remains a complex and partially unresolved challenge. Notably, secondary load cycles significantly contribute to the ’ringing’ responses in cylindrical structures, as discussed in previous studies (e.g., Grue et al. (1993), Chaplin et al. (1997)). This paper focuses on analysing loads in focused wave groups, representing short-term extreme wave conditions, on bottom-mounted vertical cylinders relevant to fixed offshore wind turbines. Pressure contour plots over the cylinder’s surface were previously examined by Ghadirian & Bredmose (2020) while studying secondary load cycles. In this research, we adopt the phase-based harmonic separation method for wave forces (Fitzgerald et al. (2014)) to analyse the pressure contour plots. This method effectively isolates harmonic pressure components from the total pressures, enabling a novel exploration of the mechanisms behind secondary load cycles from the perspective of high-order harmonics on the cylinder surface
- …