16 research outputs found
Graph Positional and Structural Encoder
Positional and structural encodings (PSE) enable better identifiability of
nodes within a graph, as in general graphs lack a canonical node ordering. This
renders PSEs essential tools for empowering modern GNNs, and in particular
graph Transformers. However, designing PSEs that work optimally for a variety
of graph prediction tasks is a challenging and unsolved problem. Here, we
present the graph positional and structural encoder (GPSE), a first-ever
attempt to train a graph encoder that captures rich PSE representations for
augmenting any GNN. GPSE can effectively learn a common latent representation
for multiple PSEs, and is highly transferable. The encoder trained on a
particular graph dataset can be used effectively on datasets drawn from
significantly different distributions and even modalities. We show that across
a wide range of benchmarks, GPSE-enhanced models can significantly improve the
performance in certain tasks, while performing on par with those that employ
explicitly computed PSEs in other cases. Our results pave the way for the
development of large pre-trained models for extracting graph positional and
structural information and highlight their potential as a viable alternative to
explicitly computed PSEs as well as to existing self-supervised pre-training
approaches
Long Range Graph Benchmark
Graph Neural Networks (GNNs) that are based on the message passing (MP)
paradigm generally exchange information between 1-hop neighbors to build node
representations at each layer. In principle, such networks are not able to
capture long-range interactions (LRI) that may be desired or necessary for
learning a given task on graphs. Recently, there has been an increasing
interest in development of Transformer-based methods for graphs that can
consider full node connectivity beyond the original sparse structure, thus
enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop
message passing often fare better in several existing graph benchmarks when
combined with positional feature representations, among other innovations,
hence limiting the perceived utility and ranking of Transformer-like
architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5
graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and
Peptides-struct that arguably require LRI reasoning to achieve strong
performance in a given task. We benchmark both baseline GNNs and Graph
Transformer networks to verify that the models which capture long-range
dependencies perform significantly better on these tasks. Therefore, these
datasets are suitable for benchmarking and exploration of MP-GNNs and Graph
Transformer architectures that are intended to capture LRI.Comment: Added reference to T\"onshoff et al., 2023 in Sec. 4.1; NeurIPS 2022
Track on D&B; Open-sourced at: https://github.com/vijaydwivedi75/lrg
Signatures of cell death and proliferation in perturbation transcriptomics data-from confounding factor to effective prediction
Transcriptional perturbation signatures are valuable data sources for functional genomics. Linking perturbation signatures to screenings opens the possibility to model cellular phenotypes from expression data and to identify efficacious drugs. We linked perturbation transcriptomics data from the LINCS-L1000 project with cell viability information upon genetic (Achilles project) and chemical (CTRP screen) perturbations yielding more than 90 000 signature-viability pairs. An integrated analysis showed that the cell viability signature is a major factor underlying perturbation signatures. The signature is linked to transcription factors regulating cell death, proliferation and division time. We used the cell viability-signature relationship to predict viability from transcriptomics signatures, and identified and validated compounds that induce cell death in tumor cell lines. We showed that cellular toxicity can lead to unexpected similarity of signatures, confounding mechanism of action discovery. Consensus compound signatures predicted cell-specific drug sensitivity, even if the signature is not measured in the same cell line, and outperformed conventional drug-specific features. Our results can help in understanding mechanisms behind cell death and removing confounding factors of transcriptomic perturbation screens. To interactively browse our results and predict cell viability in new gene expression samples, we developed CEVIChE (CEll VIability Calculator from gene Expression; https://saezlab.shinyapps.io/ceviche/)
RNA motif search with data-driven element ordering
BACKGROUND: In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. RESULTS: We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. CONCLUSIONS: We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1074-x) contains supplementary material, which is available to authorized users
RNA motif search with data-driven element ordering
Abstract
Background
In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms.
Results
We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools.
Conclusions
We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at
http://compbio.fmph.uniba.sk/rnarobo
RNA motif search with data-driven element ordering.
BackgroundIn this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms.ResultsWe have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools.ConclusionsWe have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo
Additional file 1 of RNA motif search with data-driven element ordering
Supplementary online material. The file contains supplementary material with additional details on methods, file formats, and experiments. (PDF 617 kb