5,027 research outputs found
Towards Better Dynamic Graph Learning: New Architecture and Unified Library
We propose DyGFormer, a new Transformer-based architecture for dynamic graph
learning. DyGFormer is conceptually simple and only needs to learn from nodes'
historical first-hop interactions by: (1) a neighbor co-occurrence encoding
scheme that explores the correlations of the source node and destination node
based on their historical sequences; (2) a patching technique that divides each
sequence into multiple patches and feeds them to Transformer, allowing the
model to effectively and efficiently benefit from longer histories. We also
introduce DyGLib, a unified library with standard training pipelines,
extensible coding interfaces, and comprehensive evaluating protocols to promote
reproducible, scalable, and credible dynamic graph learning research. By
performing exhaustive experiments on thirteen datasets for dynamic link
prediction and dynamic node classification tasks, we find that DyGFormer
achieves state-of-the-art performance on most of the datasets, demonstrating
its effectiveness in capturing nodes' correlations and long-term temporal
dependencies. Moreover, some results of baselines are inconsistent with
previous reports, which may be caused by their diverse but less rigorous
implementations, showing the importance of DyGLib. All the used resources are
publicly available at https://github.com/yule-BUAA/DyGLib.Comment: Accepted at NeurIPS 202
HI 21cm imaging of a nearby Damped Lyman-alpha system
We present Giant Metrewave Radio Telescope (GMRT) HI 21cm emission images of
the z=0.009 damped Lyman-alpha (DLA) absorber towards the QSO HS 1543+5921. The
DLA has been earlier identified as a low surface brightness galaxy SBS 1543+593
at an impact parameter of ~ 400 pc to the QSO line of sight. The extremely low
redshift of the absorber allows us to make spatially resolved images of the
21cm emission; besides the HI mass, this also enables us to determine the
velocity field of the galaxy and, hence, to estimate its dynamical mass.
We obtain a total HI mass of ~ 1.4x10^9 Msun, considerably smaller than the
value of M*(HI) determined from blind 21cm emission surveys. This continues the
trend of low HI mass in all low redshift DLAs for which HI emission
observations have been attempted. We also find that the QSO lies behind a
region of low local HI column density in the foreground galaxy. This is
interesting in view of suggestions that DLA samples are biased against high HI
column density systems. The dynamical mass of the galaxy is found to be Mdyn ~
5x10^9 Msun.Comment: Accepted for publication in A&
Pretraining Language Models with Text-Attributed Heterogeneous Graphs
In many real-world scenarios (e.g., academic networks, social platforms),
different types of entities are not only associated with texts but also
connected by various relationships, which can be abstracted as Text-Attributed
Heterogeneous Graphs (TAHGs). Current pretraining tasks for Language Models
(LMs) primarily focus on separately learning the textual information of each
entity and overlook the crucial aspect of capturing topological connections
among entities in TAHGs. In this paper, we present a new pretraining framework
for LMs that explicitly considers the topological and heterogeneous information
in TAHGs. Firstly, we define a context graph as neighborhoods of a target node
within specific orders and propose a topology-aware pretraining task to predict
nodes involved in the context graph by jointly optimizing an LM and an
auxiliary heterogeneous graph neural network. Secondly, based on the
observation that some nodes are text-rich while others have little text, we
devise a text augmentation strategy to enrich textless nodes with their
neighbors' texts for handling the imbalance issue. We conduct link prediction
and node classification tasks on three datasets from various domains.
Experimental results demonstrate the superiority of our approach over existing
methods and the rationality of each design. Our code is available at
https://github.com/Hope-Rita/THLM.Comment: Accepted by EMNLP 2023 Finding
The MgII Cross-section of Luminous Red Galaxies
We describe a search for MgII(2796,2803) absorption lines in Sloan Digital
Sky Survey (SDSS) spectra of QSOs whose lines of sight pass within impact
parameters of 200 kpc of galaxies with photometric redshifts of z=0.46-0.6 and
redshift errors Delta z~0.05. The galaxies selected have the same colors and
luminosities as the Luminous Red Galaxy (LRG) population previously selected
from the SDSS. A search for Mg II lines within a redshift interval of +/-0.1 of
a galaxy's photometric redshift shows that absorption by these galaxies is
rare: the covering fraction is ~ 10-15% between 20 and 100 kpc, for Mg II lines
with rest equivalent widths of Wr >= 0.6{\AA}, falling to zero at larger
separations. There is no evidence that Wr correlates with impact parameter or
galaxy luminosity. Our results are consistent with existing scenarios in which
cool Mg II-absorbing clouds may be absent near LRGs because of the environment
of the galaxies: if LRGs reside in high-mass groups and clusters, either their
halos are too hot to retain or accrete cool gas, or the galaxies themselves -
which have passively-evolving old stellar populations - do not produce the
rates of star formation and outflows of gas necessary to fill their halos with
Mg II absorbing clouds. In the rarer cases where Mg II is detected, however,
the origin of the absorption is less clear. Absorption may arise from the
little cool gas able to reach into cluster halos from the intergalactic medium,
or from the few star-forming and/or AGN-like LRGs that are known to exist.Comment: Accepted by ApJ; minor correction
Finite type approximations of Gibbs measures on sofic subshifts
Consider a H\"older continuous potential defined on the full shift
A^\nn, where is a finite alphabet. Let X\subset A^\nn be a specified
sofic subshift. It is well-known that there is a unique Gibbs measure
on associated to . Besides, there is a natural nested
sequence of subshifts of finite type converging to the sofic subshift
. To this sequence we can associate a sequence of Gibbs measures
. In this paper, we prove that these measures weakly converge
at exponential speed to (in the classical distance metrizing weak
topology). We also establish a strong mixing property (ensuring weak
Bernoullicity) of . Finally, we prove that the measure-theoretic
entropy of converges to the one of exponentially fast.
We indicate how to extend our results to more general subshifts and potentials.
We stress that we use basic algebraic tools (contractive properties of iterated
matrices) and symbolic dynamics.Comment: 18 pages, no figure
Predicting Temporal Sets with Deep Neural Networks
Given a sequence of sets, where each set contains an arbitrary number of
elements, the problem of temporal sets prediction aims to predict the elements
in the subsequent set. In practice, temporal sets prediction is much more
complex than predictive modelling of temporal events and time series, and is
still an open problem. Many possible existing methods, if adapted for the
problem of temporal sets prediction, usually follow a two-step strategy by
first projecting temporal sets into latent representations and then learning a
predictive model with the latent representations. The two-step approach often
leads to information loss and unsatisfactory prediction performance. In this
paper, we propose an integrated solution based on the deep neural networks for
temporal sets prediction. A unique perspective of our approach is to learn
element relationship by constructing set-level co-occurrence graph and then
perform graph convolutions on the dynamic relationship graphs. Moreover, we
design an attention-based module to adaptively learn the temporal dependency of
elements and sets. Finally, we provide a gated updating mechanism to find the
hidden shared patterns in different sequences and fuse both static and dynamic
information to improve the prediction performance. Experiments on real-world
data sets demonstrate that our approach can achieve competitive performances
even with a portion of the training data and can outperform existing methods
with a significant margin.Comment: 9 pages, 6 figures, Proceedings of the 26th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD '2020
Event-based Dynamic Graph Representation Learning for Patent Application Trend Prediction
Accurate prediction of what types of patents that companies will apply for in
the next period of time can figure out their development strategies and help
them discover potential partners or competitors in advance. Although important,
this problem has been rarely studied in previous research due to the challenges
in modelling companies' continuously evolving preferences and capturing the
semantic correlations of classification codes. To fill in this gap, we propose
an event-based dynamic graph learning framework for patent application trend
prediction. In particular, our method is founded on the memorable
representations of both companies and patent classification codes. When a new
patent is observed, the representations of the related companies and
classification codes are updated according to the historical memories and the
currently encoded messages. Moreover, a hierarchical message passing mechanism
is provided to capture the semantic proximities of patent classification codes
by updating their representations along the hierarchical taxonomy. Finally, the
patent application trend is predicted by aggregating the representations of the
target company and classification codes from static, dynamic, and hierarchical
perspectives. Experiments on real-world data demonstrate the effectiveness of
our approach under various experimental conditions, and also reveal the
abilities of our method in learning semantics of classification codes and
tracking technology developing trajectories of companies.Comment: Accepted by the TKDE journa
AcrFinder: genome mining anti-CRISPR operons in prokaryotes and their viruses
Anti-CRISPR (Acr) proteins encoded by (pro)phages/(pro)viruses have a great potential to enable a more controllable genome editing. However, genome mining new Acr proteins is challenging due to the lack of a conserved functional domain and the low sequence similarity among experimentally char- acterized Acr proteins. We introduce here AcrFinder, a web server (http://bcb.unl.edu/AcrFinder) that combines three well-accepted ideas used by pre- vious experimental studies to pre-screen genomic data for Acr candidates. These ideas include ho- mology search, guilt-by-association (GBA), and CRISPR-Cas self-targeting spacers. Compared to existing bioinformatics tools, AcrFinder has the following unique functions: (i) it is the first online server specifically mining genomes for Acr-Aca operons; (ii) it provides a most comprehensive Acr and Aca (Acr-associated regulator) database (populated by GBA-based Acr and Aca datasets); (iii) it combines homology-based, GBA-based, and self-targeting approaches in one software package; and (iv) it provides a user-friendly web interface to take both nucleotide and protein sequence files as inputs, and output a result page with graphic representation of the genomic contexts of Acr-Aca operons. The leave-one-out cross-validation on ex- perimentally characterized Acr-Aca operons showed that AcrFinder had a 100% recall. AcrFinder will be a valuable web resource to help experimental microbiologists discover new Anti-CRISPRs
- …