35 research outputs found
Residual and forecast methods in time series models with covariates
We are dealing with time series which are measured on an arbitrary scale, e.g. on a categorical or ordinal scale, and which are recorded together with time varying covariates. The conditional expectations are modelled as a regression model, its parameters are estimated via likelihood- or quasi-likelihood-approach. Our main concern are diagnostic methods and forecasting procedures for such time series models. Diagnostics are based on (partial) residual measures as well as on (partial) residual variables; l-step predictors are gained by an approximation formula for conditional expectations. The various methods proposed are illustrated by two different data sets
Semiparametric Point Process and Time Series Models for Series of Events
We are dealing with series of events occurring at random times tau_n and carrying further quantitive information xi_n . Examples are sequences of extrasystoles in ECGrecords. We will present two approaches for analyzing such (typically long) sequences (tau_n, xi_n ), n = 1, 2, ... . (i) A point process model is based on an intensity of the form alpha(t) * b_t(theta), t >= 0, with b_t a stochastic intensity of the selfexciting type. (ii) A time series approach is based on a transitional GLM. The conditional expectation of the waiting time sigma_{n+1} = tau_{n+1} - tau_n is set to be v(tau_n) * h(eta_n(theta)), with h a response function and eta_n a regression term. The deterministic functions alpha and v, respectively, describe the long-term trend of the process
Semiparametric Estimation in Regression Models for Point Processes based on One Realization
We are dealing with regression models for point processes having a multiplicative intensity process of the form alpha(t) * b_t . The deterministic function alpha describes the long-term trend of the process. The stochastic process b accounts for the short-term random variations and depends on a finite-dimensional parameter. The semiparametric estimation procedure is based on one single observation over a long time interval. We will use penalized estimation functions to estimate the trend alpha, while the likelihood approach to point processes is employed for the parametric part of the problem. Our methods are applied to earthquake data as well as to records on 24-hours ECG
Regression Analysis for Forest Inventory Data with Time and Space Dependencies
In this paper the data of a forest health inventory are analysed. Since 1983 the degree of defoliation (damage), together with various explanatory variables (covariates) concerning stand, site, soil and weather, are recorded by the second of the two authors, in the forest district Rothenbuch (Spessart, Bavaria). The focus is on the space and time dependencies of the data. The mutual relationship of space-time functions on the one side and the set of covariates on the other is worked out. To this end we employ generalized linear models (GLMs) for ordinal response variables and employ semiparametric estimation approaches and appropriate residual methods. It turns out that (i) the contribution of space-time functions is quantitatively comparable with that of the set of covariates, (ii) the data contain much more (timely and spatially) sequential structure than smooth space-time structure, (iii) a fine analysis of the individual sites in the area can be carried out with respect to predictive power of the covariates
Asymptotic behaviour of estimation equations with functional nuisance or working parameter
We are concerned with the asymptotic theory of semiparametric estimation equations. We are dealing with estimation equations which have a parametric component of interest and a functional (nonparametric) nuisance component. We give sufficient conditions for the existence and the asymptotic normality of a consistent estimation equation estimator for the parameter of interest. These conditions concern the asymptotic distribution of the estimation function and of its derivative as well as the effect of the functional nuisance part in the estimation equation. In order to treat the nonparametric component we introduce a general differential calculus and a general mean value theorem. For the nonparametric part in the estimation equation we distinguish two cases: the situation of a (classical) nuisance parameter and the case of a so called working parameter. As a special case we get regularity conditions for estimation equations with finite dimensional nuisance or working parameter. As an example we present the semiparametric linear regression model
Semi-parametric Inference for Regression Models Based on Marked Point Processes
We study marked point processes (MPP's) with an arbitrary mark space. First we develop some statistically relevant topics in the theory of MPP's admitting an intensity kernel , namely martingale results, central limit theorems for both the number of objects under observation and the time tending to infinity, the decomposition into a local characteristic and a likelihood approach. Then we present semi-parametric statistical inference in a class of Aalen (1975)-type multiplicative regression models for MPP's as , using partial likelihood methods. Furthermore, considering the case , we study purely parametric M-estimators
Statistische Analyse des Einflusses von Herzrhythmusstörungen auf das Mortalitätsrisiko
Herzrhythmusstörungen stellen eine äußerst bedrohliche Krankheit dar und können zum plötzlichen Herztod führen. So sterben in der Bundesrepublik Deutschland pro Jahr etwa 100.000 Patienten an einem Herz-Kreislauf-Stillstand, der in 65 - 80 % durch eine Rhythmusstörung hervorgerufen wird (Trappe et al., 1996). Seit über 20 Jahren ist bekannt, daß das Ausmaß der Rhythmusstörungen wesentlich das Risiko für einen plötzlichen Herztod beeinflußt (Moss et al., 1979).
Die Identifizierung von Patienten mit einem erhöhten Mortalitätsrisiko ist daher von erheblichem Interesse und nach wie vor noch nicht zufriedenstellend gelöst. Von dieser Frage hängt die Wahl der geeigneten Therapie ab. Bei Patienten mit einem erhöhten Mortalitätsrisiko ist derzeit die Implementierung eines Defibrillators die einzig wirksame Therapie. Die medikamentöse Behandlung mit sog. Antiarrhythmika war lange Zeit die Therapie der Wahl, bis Ende der 80er Jahre eine Studie aus den USA für einige Medikamente ein erhöhtes Mortalitätsrisiko nachwies (CAST-Studie, 1989).
Seit dieser Zeit konzentriert sich die Forschung auf zwei Punkte, die Entwicklung neuer Medikamente und die Erkennung von besonders gefährdeten Patienten. Die einzige nicht-invasive Methode zur Erfassung der Häufigkeit der Arrhythmien ist gegenwärtig das 24 Std. Holter-EKG. Derzeit wird für die Unterteilung in verschiedene Risikogruppen nur das Ausmaß der Rhythmusstörungen, die Häufigkeit der sog. ventrikulären Extrasystolen (VES) erfaßt. Dieser Faktor ist aber nicht aussagekräftig genug. Daher liegt es nahe, die Information über die Rhythmusstörungen besser zu nutzen und vor allem die Komplexität der Arrhythmien besser zu beschreiben. Hierzu werden aus dem 24 Std. Holter-EKG alle Abstände zwischen zwei aufeinanderfolgenden Herzschlägen, die sog. RR-Intervalle, erfaßt. Wenn im Durchschnitt ein Herzschlag pro Sekunde erfolgt, liegen über 24 Stunden ca. 90 000 solcher Intervalle vor. Diese Datenmenge stellt an die Analyseverfahren eine besondere Herausforderung dar.
In einem ersten Ansatz wurden Methoden aus dem Bereich der nichtlinearen Dynamik angewandt (Schmidt et al., 1996). Es ist bekannt, daß neben den Rhythmusstörungen auch die Variabilität der RR-Intervalle das Risiko beeinflussen. Mit den Ansätzen, basierend auf der nichtlinearen Dynamik, wurden aus den Daten eines 24 Std. Holter-EKG's zwei Parameter abgeleitet (alpha_VES und alpha_sin ). Der erste Parameter beschreibt die Komplexität, der zweite steht für die Variabilität.
Die vorliegende Arbeit wendet statistische Verfahren aus den Bereichen Kurvenschätzung, logistische Regression, Coxsche Regression an, um besonders gefährdete Patienten zu erkennen. Für diese Analyse standen die Daten von 60 Patienten zur Verfügung. Das Ziel dieser Untersuchung ist es insbesondere, die aufwendige Methode der Bestimmung von alpha_sin , alpha_VES durch eine neue zu ersetzen, die konzeptionell und numerisch einfacher ist, die - im Unterschied zur eingeführten - vollständig algorithmisch durchgeführt werden kann und die auch - bei entsprechender Weiterentwicklung - zum Teil online erfolgen könnte
New resampling method for evaluating stability of clusters
<p>Abstract</p> <p>Background</p> <p>Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.</p> <p>We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.</p> <p>Results</p> <p>Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.</p> <p>Conclusion</p> <p>We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.</p