664 research outputs found

    Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning

    Full text link
    Recent knowledge enhanced pre-trained language models have shown remarkable performance on downstream tasks by incorporating structured knowledge from external sources into language models. However, they usually suffer from a heterogeneous information alignment problem and a noisy knowledge injection problem. For complex reasoning, the contexts contain rich knowledge that typically exists in complex and sparse forms. In order to model structured knowledge in the context and avoid these two problems, we propose to unify structure reasoning and language model pre-training. It identifies four types of elementary knowledge structures from contexts to construct structured queries, and utilizes the box embedding method to conduct explicit structure reasoning along queries during language modeling. To fuse textual and structured semantics, we utilize contextual language representations of knowledge structures to initialize their box embeddings for structure reasoning. We conduct experiments on complex language reasoning and knowledge graph (KG) reasoning tasks. The results show that our model can effectively enhance the performance of complex reasoning of both language and KG modalities.Comment: 10 pages, 4 figures, 6 table

    An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

    Full text link
    In this paper, we focus on the problem of unsupervised image-sentence matching. Existing research explores to utilize document-level structural information to sample positive and negative instances for model training. Although the approach achieves positive results, it introduces a sampling bias and fails to distinguish instances with high semantic similarity. To alleviate the bias, we propose a new sampling strategy to select additional intra-document image-sentence pairs as positive or negative samples. Furthermore, to recognize the complex pattern in intra-document samples, we propose a Transformer based model to capture fine-grained features and implicitly construct a graph for each document, where concepts in a document are introduced to bridge the representation learning of images and sentences in the context of a document. Experimental results show the effectiveness of our approach to alleviate the bias and learn well-aligned multimodal representations.Comment: To be published in AAAI202

    Dance Your Latents: Consistent Dance Generation through Spatial-temporal Subspace Attention Guided by Motion Flow

    Full text link
    The advancement of generative AI has extended to the realm of Human Dance Generation, demonstrating superior generative capacities. However, current methods still exhibit deficiencies in achieving spatiotemporal consistency, resulting in artifacts like ghosting, flickering, and incoherent motions. In this paper, we present Dance-Your-Latents, a framework that makes latents dance coherently following motion flow to generate consistent dance videos. Firstly, considering that each constituent element moves within a confined space, we introduce spatial-temporal subspace-attention blocks that decompose the global space into a combination of regular subspaces and efficiently model the spatiotemporal consistency within these subspaces. This module enables each patch pay attention to adjacent areas, mitigating the excessive dispersion of long-range attention. Furthermore, observing that body part's movement is guided by pose control, we design motion flow guided subspace align & restore. This method enables the attention to be computed on the irregular subspace along the motion flow. Experimental results in TikTok dataset demonstrate that our approach significantly enhances spatiotemporal consistency of the generated videos.Comment: 10 pages, 5 figure

    A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings

    Full text link
    Speaker-attributed automatic speech recognition (SA-ASR) in multiparty meeting scenarios is one of the most valuable and challenging ASR task. It was shown that single-channel frame-level diarization with serialized output training (SC-FD-SOT), single-channel word-level diarization with SOT (SC-WD-SOT) and joint training of single-channel target-speaker separation and ASR (SC-TS-ASR) can be exploited to partially solve this problem. SC-FD-SOT obtains the speaker-attributed transcriptions by aligning the speaker diarization results with the ASR hypotheses, SC-WD-SOT uses word-level diarization to get rid of the alignment dependence on timestamps, and SC-TS-ASR jointly trains target-speaker separation and ASR modules, which achieves the best performance. In this paper, we propose three corresponding multichannel (MC) SA-ASR approaches, namely MC-FD-SOT, MC-WD-SOT and MC-TS-ASR. For different tasks/models, different multichannel data fusion strategies are considered, including channel-level cross-channel attention for MC-FD-SOT, frame-level cross-channel attention for MC-WD-SOT and neural beamforming for MC-TS-ASR. Experimental results on the AliMeeting corpus reveal that our proposed multichannel SA-ASR models can consistently outperform the corresponding single-channel counterparts in terms of the speaker-dependent character error rate (SD-CER)

    An extensive study of blazar broad emission line: Changing-look blazars and Baldwin effect

    Full text link
    It is known that the blazar jet emissions are dominated by non-thermal radiation while the accretion disk jets are normally dominated by thermal emission. In this work, our aim is to study the connection between the two types of emission by investigating the correlation between the blazar emission line intensity property, which embodies the nature of accretion disk, and the γ\gamma-ray flux property, which is the representative of jet emission. We compiled a sample of 656 blazars with available emission line equivalent widths (EWEW), the GeV γ\gamma-ray flux, and the SED information from the literature. In this work, we found 55 previous BCUs are now identified as FSRQs, and found 52 Changing-look blazars based on their EWEW and 45 of them are newly confirmed. These Changing-look blazars have a larger accretion ratio (M˙/M˙Edd{\dot M}/{\dot M}_{\rm Edd}) than BL Lac objects. In addition, we suggest that the lower synchrotron peak blazars (LSPs) could be the source of Changing-look blazars because 90.7\% of the Changing-look blazars in this work are confirmed as LSPs. An anti-correlation between EWEW and continuum intensity, the so-called global Baldwin effect (BEff) has been confirmed. We suggest the steeper global BEff observed for blazar than for radio-quiet active galactic nuclei (RQ-AGNs) is caused by the inverse Compton scattering of broad-emission-line photons. This interpretation is further supported by the positive correlation between the emission line EWEW and intrinsic inverse Compton luminosity.Comment: Accepted to Ap

    Remote sensing-based spatiotemporal variation and driving factor assessment of chlorophyll-a concentrations in China’s Pearl River Estuary

    Get PDF
    Climate change and intensive anthropogenic activities have severely challenged the water quality of China’s Pearl River Estuary (PRE). Further investigations into long-term water quality variation and associated driving mechanisms are therefore necessary to support the sustainable development of the PRE’s Greater Bay Area (GBA). This study used remote sensing retrieval to address long-term spatiotemporal chlorophyll-a (Chl-a) variation characteristics in the PRE and the relationship between Chl-a concentrations and socioeconomic/environmental indicators. Three decades of Landsat satellite images and measured data were collected, and a two-band global algorithm was used to retrieve Chl-a concentration data. Results reveal significant spatiotemporal variability in Chl-a concentrations. The space-averaged Chl-a concentration exhibited a slight downward trend during the past three decades, and the multi-year mean value was 5.20 mg/L. Changes to environmental protection policies in recent years have improved overall PRE water quality. The western section of the PRE had the highest Chl-a concentration (i.e., 5.92 mg/L average) while the eastern section had the lowest (i.e., 3.98 mg/L average). This discrepancy was likely caused by the western section’s more intensive industrial activities, resulting in a higher overall wastewater discharge volume. Affected by climatic conditions, winter Chl-a concentrations were evenly distributed while summer concentrations were significantly higher. Additionally, Chl-a concentrations significantly and positively correlated with total phosphorus (TP), total nitrogen (TN), ammonia nitrogen (NH3-N), and the biotic oxygen demand (BOD5). Chl-a concentrations also correlated with external factors (i.e., climate and anthropogenic activities). Among these factors, industrial wastewater discharge and the proportion of primary industries in coastal cities significantly and positively correlated with water quality. This study is intended to help direct water quality improvement management and urban sustainable development in the GBA

    A general design method of cam profile based on cubic splines and dynamic model : case study of a gravity-driven tricycle

    Get PDF
    This paper proposes a general design method for cams based on the kinematics and dynamics of a mechanical system. According to the actuator’s trajectory, the cam profile is generated in reverse based on the kinematic model of the system. Firstly, the cam design’s optimising problem is converted into the execution trajectory’s optimisation to obtain the optimum operation trajectory according to the actuator’s requirements. Secondly, the relationship between the cam profile and the actuation trajectory is modelled based on the kinematics and dynamics of the mechanical system. Then, applying the cubic spline interpolation method, the cam profile is generated, and the error compensation methods are illustrated through numerical analysis. Finally, the validity of the presented design method is verified through experiments, which demonstrate the reliability of this method
    • …
    corecore