9 research outputs found
04091 Abstracts Collection -- Data Structures
From 22.02. to 27.02.2004, Dagstuhl Seminar "Data Structures"
was held in the International Conference and Research Center (IBFI),
Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed.
Abstracts of the presentations given during the seminar
are put together in this paper. The first section
describes the seminar topics and goals in general
Wrapper algorithms and their performance assessment on high-dimensional molecular data
Prediction problems on high-dimensional molecular data, e.g. the classification of microar-
ray samples into normal and cancer tissues, are complex and ill-posed since the number
of variables usually exceeds the number of observations by orders of magnitude. Recent
research in the area has propagated a variety of new statistical models in order to handle
these new biological datasets. In practice, however, these models are always applied in
combination with preprocessing and variable selection methods as well as model selection
which is mostly performed by cross-validation. Varma and Simon (2006) have used the
term ‘wrapper-algorithm’ for this integration of preprocessing and model selection into the
construction of statistical models. Additionally, they have proposed the method of nested
cross-validation (NCV) as a way of estimating their prediction error which has evolved to
the gold-standard by now.
In the first part, this thesis provides further theoretical and empirical justification for
the usage of NCV in the context of wrapper-algorithms. Moreover, a computationally less
intensive alternative to NCV is proposed which can be motivated in a decision theoretic
framework. The new method can be interpreted as a smoothed variant of NCV and, in
contrast to NCV, guarantees intuitive bounds for the estimation of the prediction error.
The second part focuses on the ranking of wrapper algorithms. Cross-study-validation is
proposed as an alternative concept to the repetition of separated within-study-validations
if several similar prediction problems are available. The concept is demonstrated using
six different wrapper algorithms for survival prediction on censored data on a selection of
eight breast cancer datasets. Additionally, a parametric bootstrap approach for simulating
realistic data from such related prediction problems is described and subsequently applied
to illustrate the concept of cross-study-validation for the ranking of wrapper algorithms.
Eventually, the last part approaches computational aspects of the analyses and simula-
tions performed in the thesis. The preprocessing before the analysis as well as the evaluation
of the prediction models requires the usage of large computing resources. Parallel comput-
ing approaches are illustrated on cluster, cloud and high performance computing resources
using the R programming language. Usage of heterogeneous hardware and processing of
large datasets are covered as well as the implementation of the R-package survHD for
the analysis and evaluation of high-dimensional wrapper algorithms for survival prediction
from censored data.Prädiktionsprobleme für hochdimensionale genetische Daten, z.B. die Klassifikation von
Proben in normales und Krebsgewebe, sind komplex und unterbestimmt, da die Anzahl
der Variablen die Anzahl der Beobachtungen um ein Vielfaches übersteigt. Die Forschung
hat auf diesem Gebiet in den letzten Jahren eine Vielzahl an neuen statistischen Meth-
oden hervorgebracht. In der Praxis werden diese Algorithmen jedoch stets in Kombination mit Vorbearbeitung und Variablenselektion sowie Modellwahlverfahren angewandt,
wobei letztere vorwiegend mit Hilfe von Kreuzvalidierung durchgeführt werden. Varma
und Simon (2006) haben den Begriff ’Wrapper-Algorithmus’ für eine derartige Einbet-
tung von Vorbearbeitung und Modellwahl in die Konstruktion einer statistischen Methode
verwendet. Zudem haben sie die genestete Kreuzvalidierung (NCV) als eine Methode
zur Sch ̈atzung ihrer Fehlerrate eingeführt, welche sich mittlerweile zum Goldstandard entwickelt hat. Im ersten Teil dieser Doktorarbeit, wird eine tiefergreifende theoretische
Grundlage sowie eine empirische Rechtfertigung für die Anwendung von NCV bei solchen
’Wrapper-Algorithmen’ vorgestellt. Außerdem wird eine alternative, weniger computerintensive Methode vorgeschlagen, welche im Rahmen der Entscheidungstheorie motiviert
wird. Diese neue Methode kann als eine gegl ̈attete Variante von NCV interpretiert wer-
den und hält im Gegensatz zu NCV intuitive Grenzen bei der Fehlerratenschätzung ein.
Der zweite Teil behandelt den Vergleich verschiedener ’Wrapper-Algorithmen’ bzw. das
Sch ̈atzen ihrer Reihenfolge gem ̈aß eines bestimmten Gütekriteriums. Als eine Alterna-
tive zur wiederholten Durchführung von Kreuzvalidierung auf einzelnen Datensätzen wird
das Konzept der studienübergreifenden Validierung vorgeschlagen. Das Konzept wird anhand von sechs verschiedenen ’Wrapper-Algorithmen’ für die Vorhersage von Uberlebenszeiten bei acht Brustkrebsstudien dargestellt. Zusätzlich wird ein Bootstrapverfahren
beschrieben, mit dessen Hilfe man mehrere realistische Datens ̈atze aus einer Menge von
solchen verwandten Prädiktionsproblemen generieren kann. Der letzte Teil beleuchtet
schließlich computationale Verfahren, die bei der Umsetzung der Analysen in dieser Dissertation eine tragende Rolle gespielt haben. Die Vorbearbeitungsschritte sowie die Evaluation der Prädiktionsmodelle erfordert die extensive Nutzung von Computerressourcen.
Es werden Ansätze zum parallelen Rechnen auf Cluster-, Cloud- und Hochleistungsrechen-
ressourcen unter der Verwendung der Programmiersprache R beschrieben. Die Benutzung
von heterogenen Hardwarearchitekturen, die Verarbeitung von großen Datensätzen sowie
die Entwicklung des R-Pakets survHD für die Analyse und Evaluierung von ’Wrapper-
Algorithmen’ zur Uberlebenszeitenanalyse
werden thematisiert
Recommended from our members
Investigating Latent State and Uncertainty Representations in Reinforcement Learning
Learning latent space representations of high-dimensional world states has been at the core of recent rapid growth in reinforcement learning(RL). At the same time, RL algo- rithms have suffered from ignored uncertainties in the predicted estimates of model-free or model-based methods. In our work, we investigate both of these aspects independently. Firstly, we studied the explainability of policies learned over latent representations. In particular, we focus on control policies represented as recurrent neural networks (RNNs) which are difficult to explain, understand, and analyze due to their use of continuous- valued memory vectors and observation features. We introduced a new technique, Quan- tized Bottleneck Insertion, to learn finite representations of these vectors and features. This helped us to create a finite-state machine representation of the policies which we show improves their interpretability. Secondly, we studied model-based reinforcement learning approach for continuous action spaces based on tree-based planning over learned latent dynamics. We demonstrate improvement in sample efficiency and performance on a majority of challenging continuous-control benchmarks compared to the state-of-the- art methods by including look-ahead search during decision-time planning. Thirdly, we study policy evaluation over offline historical data and highlight the need to couple confi- dence values with the estimated policy evaluations for capturing uncertainties. Towards this, we created a benchmark to study confidence estimation by offline reinforcement learning(ORL) methods. This benchmark is derived by adding sets of policy comparison queries to datasets from ORL and comes with a set of evaluation metrics. In addition, we present an empirical evaluation of a class of model-based baselines over our benchmark. These baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggested advantages for certain baseline variations, there appears to be significant room for improvement in future work
A Novel Control Engineering Approach to Designing and Optimizing Adaptive Sequential Behavioral Interventions
abstract: Control engineering offers a systematic and efficient approach to optimizing the effectiveness of individually tailored treatment and prevention policies, also known as adaptive or ``just-in-time'' behavioral interventions. These types of interventions represent promising strategies for addressing many significant public health concerns. This dissertation explores the development of decision algorithms for adaptive sequential behavioral interventions using dynamical systems modeling, control engineering principles and formal optimization methods. A novel gestational weight gain (GWG) intervention involving multiple intervention components and featuring a pre-defined, clinically relevant set of sequence rules serves as an excellent example of a sequential behavioral intervention; it is examined in detail in this research.
A comprehensive dynamical systems model for the GWG behavioral interventions is developed, which demonstrates how to integrate a mechanistic energy balance model with dynamical formulations of behavioral models, such as the Theory of Planned Behavior and self-regulation. Self-regulation is further improved with different advanced controller formulations. These model-based controller approaches enable the user to have significant flexibility in describing a participant's self-regulatory behavior through the tuning of controller adjustable parameters. The dynamic simulation model demonstrates proof of concept for how self-regulation and adaptive interventions influence GWG, how intra-individual and inter-individual variability play a critical role in determining intervention outcomes, and the evaluation of decision rules.
Furthermore, a novel intervention decision paradigm using Hybrid Model Predictive Control framework is developed to generate sequential decision policies in the closed-loop. Clinical considerations are systematically taken into account through a user-specified dosage sequence table corresponding to the sequence rules, constraints enforcing the adjustment of one input at a time, and a switching time strategy accounting for the difference in frequency between intervention decision points and sampling intervals. Simulation studies illustrate the potential usefulness of the intervention framework.
The final part of the dissertation presents a model scheduling strategy relying on gain-scheduling to address nonlinearities in the model, and a cascade filter design for dual-rate control system is introduced to address scenarios with variable sampling rates. These extensions are important for addressing real-life scenarios in the GWG intervention.Dissertation/ThesisDoctoral Dissertation Chemical Engineering 201