128 research outputs found
A normalization technique for next generation sequencing experiments
Next generation sequencing (NGS) are these days one of the key technologies in biology. NGS' cost effectiveness and capability of finding the smallest variations in the genome makes them increasingly popular. For studies aiming at genome assembly, differences in read count statistics do not affect the outcome. However, these differences bias the outcome if the goal is to identify structural DNA characteristics like copy number variations (CNVs). Thus a normalization step must removed such random read count variations subsequently read counts from different experiments are comparable. Especially after normalization the commonly used assumption of Poisson read count distribution in windows on the chromosomes is more justified. Strong deviations of read counts from the estimated mean Poisson distribution indicate CNVs
Decoding Sequence Classification Models for Acquiring New Biological Insights
Classifying biological sequences is one of the most important tasks in computational biology. In the last decade, support vector machines (SVMs) in combination with sequence kernels have emerged as a de-facto standard. These methods are theoretically well-founded, reliable, and provide high-accuracy solutions at low computational cost. However, obtaining a highly accurate classifier is rarely the end of the story in many practical situations. Instead, one often aims to acquire biological knowledge about the principles underlying a given classification task. SVMs with traditional sequence kernels do not offer a straightforward way of accessing this knowledge.

In this contribution, we propose a new approach to analyzing biological sequences on the basis of support vector machines with sequence kernels. We first extract explicit pattern weights from a given SVM. When classifying a sequence, we then compute a prediction profile by distributing the weight of each pattern to the sequence positions that match the pattern. The final profile not only allows assessing the importance of a position, but also determining for which class it is indicative. Since it is unfeasible to analyze profiles of all sequences in a given data set, we advocate using affinity propagation (AP) clustering to narrow down the analysis to a small set of typical sequences.

The proposed approach is applicable to a wide range of biological sequences and a wide selection of sequence kernels. To illustrate our framework, we present the prediction of oligomerization tendencies of coiled coil proteins as a case study.

Conformal Prediction for Time Series with Modern Hopfield Networks
To quantify uncertainty, conformal prediction methods are gaining
continuously more interest and have already been successfully applied to
various domains. However, they are difficult to apply to time series as the
autocorrelative structure of time series violates basic assumptions required by
conformal prediction. We propose HopCPT, a novel conformal prediction approach
for time series that not only copes with temporal structures but leverages
them. We show that our approach is theoretically well justified for time series
where temporal dependencies are present. In experiments, we demonstrate that
our new approach outperforms state-of-the-art conformal prediction methods on
multiple real-world time series datasets from four different domains.Comment: presented at NeurIPS 202
G-Signatures: Global Graph Propagation With Randomized Signatures
Graph neural networks (GNNs) have evolved into one of the most popular deep
learning architectures. However, GNNs suffer from over-smoothing node
information and, therefore, struggle to solve tasks where global graph
properties are relevant. We introduce G-Signatures, a novel graph learning
method that enables global graph propagation via randomized signatures.
G-Signatures use a new graph conversion concept to embed graph structured
information which can be interpreted as paths in latent space. We further
introduce the idea of latent space path mapping. This allows us to iteratively
traverse latent space paths, and, thus globally process information.
G-Signatures excel at extracting and processing global graph properties, and
effectively scale to large graph problems. Empirically, we confirm the
advantages of G-Signatures at several classification and regression tasks.Comment: 7 pages (+ appendix); 4 figure
- …