8 research outputs found
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
Diffusion models have achieved state-of-the-art performance in generative
modeling tasks across various domains. Prior works on time series diffusion
models have primarily focused on developing conditional models tailored to
specific forecasting or imputation tasks. In this work, we explore the
potential of task-agnostic, unconditional diffusion models for several time
series applications. We propose TSDiff, an unconditionally trained diffusion
model for time series. Our proposed self-guidance mechanism enables
conditioning TSDiff for downstream tasks during inference, without requiring
auxiliary networks or altering the training procedure. We demonstrate the
effectiveness of our method on three different time series tasks: forecasting,
refinement, and synthetic data generation. First, we show that TSDiff is
competitive with several task-specific conditional forecasting methods
(predict). Second, we leverage the learned implicit probability density of
TSDiff to iteratively refine the predictions of base forecasters with reduced
computational overhead over reverse diffusion (refine). Notably, the generative
performance of the model remains intact -- downstream forecasters trained on
synthetic samples from TSDiff outperform forecasters that are trained on
samples from other state-of-the-art generative time series models, occasionally
even outperforming models trained on real data (synthesize)
Neural forecasting: Introduction and literature overview
Neural network based forecasting methods have become ubiquitous in
large-scale industrial forecasting applications over the last years. As the
prevalence of neural network based solutions among the best entries in the
recent M4 competition shows, the recent popularity of neural forecasting
methods is not limited to industry and has also reached academia. This article
aims at providing an introduction and an overview of some of the advances that
have permitted the resurgence of neural networks in machine learning. Building
on these foundations, the article then gives an overview of the recent
literature on neural networks for forecasting and applications.Comment: 66 pages, 5 figure
An integrated workflow for crosslinking mass spectrometry
We present a concise workflow to enhance the mass spectrometric detection of crosslinked peptides by introducing sequential digestion and the crosslink identification software xiSEARCH. Sequential digestion enhances peptide detection by selective shortening of long tryptic peptides. We demonstrate our simple 12âfraction protocol for crosslinked multiâprotein complexes and cell lysates, quantitative analysis, and highâdensity crosslinking, without requiring specific crosslinker features. This overall approach reveals dynamic proteinâprotein interaction sites, which are accessible, have fundamental functional relevance and are therefore ideally suited for the development of small molecule inhibitors
Nutzung neuer Informationsquellen fĂŒr die Proteinstrukturvorhersage
Three-dimensional protein structures are an invaluable stepping stone towards the understanding of cellular processes. Computational protein structure prediction holds the promise of providing these structural models at low cost and effort. However, the major bottleneck towards effective protein structure prediction is the high dimensionality and vast size of the protein conformational space. These properties of the conformational space make it extremely difficult to locate the native structure through search. Information alleviates this issue by guiding search towards the native protein structure. Thus, information is invaluable in conformational space search. Not surprisingly, state-of-the-art structure prediction methods heavily rely on information. Obviously, unlocking novel sources of information should further increase our ability to accurately predict protein structure. This thesis leverages three novel sources of information to advance protein structure prediction. First, we leverage physicochemical information that is encoded in energy functions and predicted structure models. Native contact networks form characteristic patterns to be energetically favorable. This thesis develops a network-based representation to capture these patterns and uses this representation to predict residue-residue contacts. The second source of information is experimental data from high-density cross-linking/ mass spectrometry (CLMS) experiments. We integrate this information in an experimental/ computational hybrid method for protein structure determination. The third information source is corroborating information. Corroborating information judges the likelihood of the co-occurence of structural constraints. Nearly all methods provide these constraints in isolation, thereby neglecting any corroborating evidence between them. We develop a network-based analysis method to refine structure constraints with corroborating information. We demonstrate the value of these information sources in extensive ab initio structure prediction experiments with a customized conformational space search algorithm and a novel structure prediction pipeline. This pipeline reached state-of-the-art contact and ab initio structure prediction performance in the 11th community-wide Critical Assessment of Protein Structure Prediction experiment (CASP11). Using our CLMS-based hybrid method, we reconstruct the domain structures of human serum albumin in solution and in its native environment, human blood serum. This represents a disruptive first step towards a mass spectrometry-driven, ab initio structure determination method that is able to probe protein structure where it really matters: In their natural environment, which is their very place of action.Die Kenntnis von dreidimensionalen Proteinstrukturen ist fĂŒr das VerstĂ€ndnis von zellulĂ€ren Prozessen unverzichtbar. ComputergestĂŒtzte Verfahren zur Proteinstrukturvorhersage haben das Potenzial diese strukturellen Modelle mit wenig Aufwand und niedrigen Kosten zu generieren. Allerdings ist die hohe DimensionalitĂ€t und schiere GröĂe des Konformationsraumes ein groĂes Hindernis auf dem Weg zur effektiven Strukturvorhersage. Diese Eigenschaften des Suchraumes machen es extrem schwierig die native Proteinstruktur mittels Suchalgorithmen zu finden. Information leitet die Suche nach der nativen Struktur. Daher ist Information fĂŒr die Suche im Konformationsraum unverzichtbar. Viele Proteinstrukturvorhersagemethoden nutzen ein hohes MaĂ an Information. Offensichtlich sollte das ErschlieĂen neuer Informationsquellen unsere FĂ€higkeit zur genauen Strukturvorhersage massiv erweitern. Diese Dissertation demonstriert den Einsatz drei neuartiger Informationsquellen in der Strukturvorhersage. Die erste Informationsquelle ist physikalisch-chemische Information, enthalten in Energiefunktionen und vorhergesagten Strukturmodellen. Native Kontakte bilden charakteristische Netzwerke aus, um energetisch gĂŒnstig zu sein. Diese Dissertation entwickelt eine Netzwerk-basierende ReprĂ€sentation dieser charakteristischen Netzwerke um Proteinkontakte vorherzusagen. Cross-link/Massenspektrometrie (CLMS) Daten mit extrem hoher Dichte sind die zweite Informationsquelle. Wir integrieren diese Information in einer experimentellen/ computergestĂŒtzten Hybridmethode fĂŒr die Strukturbestimmung. Die dritte Informationsquelle sind sich unterstĂŒtzende Informationen. Diese beurteilen die Wahrscheinlichkeit vom simultanen Auftreten mehrerer struktureller Zwangsbedingungen. Nahezu alle Methoden sagen diese Zwangsbedingungen isoliert vorher und ignorieren daher unterstĂŒtzende Informationen. Wir entwickeln eine Netzwerkanalysemethode um mit dieser Information Zwangsbedingungen zu verfeinern. Wir demonstrieren den Nutzen dieser Informationsquellen in umfangreichen ab initio Strukturvorhersageexperimenten mit einem modifizierten Suchalgorithmus und eines neuartigen Strukturvorhersagesystems. Mit diesem System waren genaue Kontaktvorhersagen und ab initio Strukturvorhersagen in dem elften âCritical Assessment of Protein Structure Predictionâ Experiment möglich. Mit unserer CLMS-basierenden Hybridmethode konnten wir die Struktur der DomĂ€nen von Humanalbumin rekonstruieren. Dies war fĂŒr isoliertes Humanalbumin und fĂŒr Humanalbumin in Blutserum möglich, welches die natĂŒrliche Umgebung dieses Proteins darstellt. Dies ist ein wichtiger erster Stritt in Richtung einer neuen CLMS-basierenden Strukturbestimmungsmethode. Diese ist in der Lage strukturelle Informationen da zu sammeln wo es wirklich darauf ankommt: In der natĂŒrlichen Umgebung von Proteinen, in welchen sie ihre Funktion ausĂŒben
High-dimensional multivariate forecasting with low-rank Gaussian Copula processes
Predicting the dependencies between observations from multiple time series is critical for applications such as anomaly detection, financial risk management, causal analysis, or demand forecasting. However, the computational and numerical difficulties of estimating time-varying and high-dimensional covariance matrices often limits existing methods to handling at most a few hundred dimensions or requires making strong assumptions on the dependence between series. We propose to combine an RNN-based time series model with a Gaussian copula process output model with a low-rank covariance structure to reduce the computational complexity and handle non-Gaussian marginal distributions. This permits to drastically reduce the number of parameters and consequently allows the modeling of time-varying correlations of thousands of time series. We show on several real-world datasets that our method provides significant accuracy improvements over state-of-the-art baselines and perform an ablation study analyzing the contributions of the different components of our model