664 research outputs found
Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning
Recent knowledge enhanced pre-trained language models have shown remarkable
performance on downstream tasks by incorporating structured knowledge from
external sources into language models. However, they usually suffer from a
heterogeneous information alignment problem and a noisy knowledge injection
problem. For complex reasoning, the contexts contain rich knowledge that
typically exists in complex and sparse forms. In order to model structured
knowledge in the context and avoid these two problems, we propose to unify
structure reasoning and language model pre-training. It identifies four types
of elementary knowledge structures from contexts to construct structured
queries, and utilizes the box embedding method to conduct explicit structure
reasoning along queries during language modeling. To fuse textual and
structured semantics, we utilize contextual language representations of
knowledge structures to initialize their box embeddings for structure
reasoning. We conduct experiments on complex language reasoning and knowledge
graph (KG) reasoning tasks. The results show that our model can effectively
enhance the performance of complex reasoning of both language and KG
modalities.Comment: 10 pages, 4 figures, 6 table
An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
In this paper, we focus on the problem of unsupervised image-sentence
matching. Existing research explores to utilize document-level structural
information to sample positive and negative instances for model training.
Although the approach achieves positive results, it introduces a sampling bias
and fails to distinguish instances with high semantic similarity. To alleviate
the bias, we propose a new sampling strategy to select additional
intra-document image-sentence pairs as positive or negative samples.
Furthermore, to recognize the complex pattern in intra-document samples, we
propose a Transformer based model to capture fine-grained features and
implicitly construct a graph for each document, where concepts in a document
are introduced to bridge the representation learning of images and sentences in
the context of a document. Experimental results show the effectiveness of our
approach to alleviate the bias and learn well-aligned multimodal
representations.Comment: To be published in AAAI202
Dance Your Latents: Consistent Dance Generation through Spatial-temporal Subspace Attention Guided by Motion Flow
The advancement of generative AI has extended to the realm of Human Dance
Generation, demonstrating superior generative capacities. However, current
methods still exhibit deficiencies in achieving spatiotemporal consistency,
resulting in artifacts like ghosting, flickering, and incoherent motions. In
this paper, we present Dance-Your-Latents, a framework that makes latents dance
coherently following motion flow to generate consistent dance videos. Firstly,
considering that each constituent element moves within a confined space, we
introduce spatial-temporal subspace-attention blocks that decompose the global
space into a combination of regular subspaces and efficiently model the
spatiotemporal consistency within these subspaces. This module enables each
patch pay attention to adjacent areas, mitigating the excessive dispersion of
long-range attention. Furthermore, observing that body part's movement is
guided by pose control, we design motion flow guided subspace align & restore.
This method enables the attention to be computed on the irregular subspace
along the motion flow. Experimental results in TikTok dataset demonstrate that
our approach significantly enhances spatiotemporal consistency of the generated
videos.Comment: 10 pages, 5 figure
A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Speaker-attributed automatic speech recognition (SA-ASR) in multiparty
meeting scenarios is one of the most valuable and challenging ASR task. It was
shown that single-channel frame-level diarization with serialized output
training (SC-FD-SOT), single-channel word-level diarization with SOT
(SC-WD-SOT) and joint training of single-channel target-speaker separation and
ASR (SC-TS-ASR) can be exploited to partially solve this problem. SC-FD-SOT
obtains the speaker-attributed transcriptions by aligning the speaker
diarization results with the ASR hypotheses, SC-WD-SOT uses word-level
diarization to get rid of the alignment dependence on timestamps, and SC-TS-ASR
jointly trains target-speaker separation and ASR modules, which achieves the
best performance. In this paper, we propose three corresponding multichannel
(MC) SA-ASR approaches, namely MC-FD-SOT, MC-WD-SOT and MC-TS-ASR. For
different tasks/models, different multichannel data fusion strategies are
considered, including channel-level cross-channel attention for MC-FD-SOT,
frame-level cross-channel attention for MC-WD-SOT and neural beamforming for
MC-TS-ASR. Experimental results on the AliMeeting corpus reveal that our
proposed multichannel SA-ASR models can consistently outperform the
corresponding single-channel counterparts in terms of the speaker-dependent
character error rate (SD-CER)
An extensive study of blazar broad emission line: Changing-look blazars and Baldwin effect
It is known that the blazar jet emissions are dominated by non-thermal
radiation while the accretion disk jets are normally dominated by thermal
emission. In this work, our aim is to study the connection between the two
types of emission by investigating the correlation between the blazar emission
line intensity property, which embodies the nature of accretion disk, and the
-ray flux property, which is the representative of jet emission. We
compiled a sample of 656 blazars with available emission line equivalent widths
(), the GeV -ray flux, and the SED information from the literature.
In this work, we found 55 previous BCUs are now identified as FSRQs, and found
52 Changing-look blazars based on their and 45 of them are newly
confirmed. These Changing-look blazars have a larger accretion ratio () than BL Lac objects. In addition, we suggest that the
lower synchrotron peak blazars (LSPs) could be the source of Changing-look
blazars because 90.7\% of the Changing-look blazars in this work are confirmed
as LSPs. An anti-correlation between and continuum intensity, the
so-called global Baldwin effect (BEff) has been confirmed. We suggest the
steeper global BEff observed for blazar than for radio-quiet active galactic
nuclei (RQ-AGNs) is caused by the inverse Compton scattering of
broad-emission-line photons. This interpretation is further supported by the
positive correlation between the emission line and intrinsic inverse
Compton luminosity.Comment: Accepted to Ap
Remote sensing-based spatiotemporal variation and driving factor assessment of chlorophyll-a concentrations in China’s Pearl River Estuary
Climate change and intensive anthropogenic activities have severely challenged the water quality of China’s Pearl River Estuary (PRE). Further investigations into long-term water quality variation and associated driving mechanisms are therefore necessary to support the sustainable development of the PRE’s Greater Bay Area (GBA). This study used remote sensing retrieval to address long-term spatiotemporal chlorophyll-a (Chl-a) variation characteristics in the PRE and the relationship between Chl-a concentrations and socioeconomic/environmental indicators. Three decades of Landsat satellite images and measured data were collected, and a two-band global algorithm was used to retrieve Chl-a concentration data. Results reveal significant spatiotemporal variability in Chl-a concentrations. The space-averaged Chl-a concentration exhibited a slight downward trend during the past three decades, and the multi-year mean value was 5.20 mg/L. Changes to environmental protection policies in recent years have improved overall PRE water quality. The western section of the PRE had the highest Chl-a concentration (i.e., 5.92 mg/L average) while the eastern section had the lowest (i.e., 3.98 mg/L average). This discrepancy was likely caused by the western section’s more intensive industrial activities, resulting in a higher overall wastewater discharge volume. Affected by climatic conditions, winter Chl-a concentrations were evenly distributed while summer concentrations were significantly higher. Additionally, Chl-a concentrations significantly and positively correlated with total phosphorus (TP), total nitrogen (TN), ammonia nitrogen (NH3-N), and the biotic oxygen demand (BOD5). Chl-a concentrations also correlated with external factors (i.e., climate and anthropogenic activities). Among these factors, industrial wastewater discharge and the proportion of primary industries in coastal cities significantly and positively correlated with water quality. This study is intended to help direct water quality improvement management and urban sustainable development in the GBA
Problematic Internet Use Among Residential College Students During the COVID-19 Lockdown: A Social Network Analysis Approach
A general design method of cam profile based on cubic splines and dynamic model : case study of a gravity-driven tricycle
This paper proposes a general design method for cams based on the kinematics and dynamics of a mechanical system. According to the actuator’s trajectory, the cam profile is generated in reverse based on the kinematic model of the system. Firstly, the cam design’s optimising problem is converted into the execution trajectory’s optimisation to obtain the optimum operation trajectory according to the actuator’s requirements. Secondly, the relationship between the cam profile and the actuation trajectory is modelled based on the kinematics and dynamics of the mechanical system. Then, applying the cubic spline interpolation method, the cam profile is generated, and the error compensation methods are illustrated through numerical analysis. Finally, the validity of the presented design method is verified through experiments, which demonstrate the reliability of this method
- …