7 research outputs found
Graph Positional and Structural Encoder
Positional and structural encodings (PSE) enable better identifiability of
nodes within a graph, as in general graphs lack a canonical node ordering. This
renders PSEs essential tools for empowering modern GNNs, and in particular
graph Transformers. However, designing PSEs that work optimally for a variety
of graph prediction tasks is a challenging and unsolved problem. Here, we
present the graph positional and structural encoder (GPSE), a first-ever
attempt to train a graph encoder that captures rich PSE representations for
augmenting any GNN. GPSE can effectively learn a common latent representation
for multiple PSEs, and is highly transferable. The encoder trained on a
particular graph dataset can be used effectively on datasets drawn from
significantly different distributions and even modalities. We show that across
a wide range of benchmarks, GPSE-enhanced models can significantly improve the
performance in certain tasks, while performing on par with those that employ
explicitly computed PSEs in other cases. Our results pave the way for the
development of large pre-trained models for extracting graph positional and
structural information and highlight their potential as a viable alternative to
explicitly computed PSEs as well as to existing self-supervised pre-training
approaches
Long Range Graph Benchmark
Graph Neural Networks (GNNs) that are based on the message passing (MP)
paradigm generally exchange information between 1-hop neighbors to build node
representations at each layer. In principle, such networks are not able to
capture long-range interactions (LRI) that may be desired or necessary for
learning a given task on graphs. Recently, there has been an increasing
interest in development of Transformer-based methods for graphs that can
consider full node connectivity beyond the original sparse structure, thus
enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop
message passing often fare better in several existing graph benchmarks when
combined with positional feature representations, among other innovations,
hence limiting the perceived utility and ranking of Transformer-like
architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5
graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and
Peptides-struct that arguably require LRI reasoning to achieve strong
performance in a given task. We benchmark both baseline GNNs and Graph
Transformer networks to verify that the models which capture long-range
dependencies perform significantly better on these tasks. Therefore, these
datasets are suitable for benchmarking and exploration of MP-GNNs and Graph
Transformer architectures that are intended to capture LRI.Comment: Added reference to T\"onshoff et al., 2023 in Sec. 4.1; NeurIPS 2022
Track on D&B; Open-sourced at: https://github.com/vijaydwivedi75/lrg
RNA motif search with data-driven element ordering
BACKGROUND: In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. RESULTS: We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. CONCLUSIONS: We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1074-x) contains supplementary material, which is available to authorized users
RNA motif search with data-driven element ordering
Abstract
Background
In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms.
Results
We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools.
Conclusions
We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at
http://compbio.fmph.uniba.sk/rnarobo
RNA motif search with data-driven element ordering.
BackgroundIn this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms.ResultsWe have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools.ConclusionsWe have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo
Additional file 1 of RNA motif search with data-driven element ordering
Supplementary online material. The file contains supplementary material with additional details on methods, file formats, and experiments. (PDF 617 kb