1,144 research outputs found

    Multiplicative local linear hazard estimation and best one-sided cross-validation

    Get PDF
    This paper develops detailed mathematical statistical theory of a new class of cross-validation techniques of local linear kernel hazards and their multiplicative bias corrections. The new class of cross-validation combines principles of local information and recent advances in indirect cross-validation. A few applications of cross-validating multiplicative kernel hazard estimation do exist in the literature. However, detailed mathematical statistical theory and small sample performance are introduced via this paper and further upgraded to our new class of best one-sided cross-validation. Best one-sided cross-validation turns out to have excellent performance in its practical illustrations, in its small sample performance and in its mathematical statistical theoretical performance

    Capturing the Zero: A New Class of Zero-Augmented Distributions and Multiplicative Error Processes

    Get PDF
    We propose a novel approach to model serially dependent positive-valued variables which realize a non-trivial proportion of zero outcomes. This is a typical phenomenon in financial time series observed on high frequencies, such as cumulated trading volumes or the time between potentially simultaneously occurring market events. We introduce a flexible point-mass mixture distribution and develop a semiparametric specification test explicitly tailored for such distributions. Moreover, we propose a new type of multiplicative error model (MEM) based on a zero-augmented distribution, which incorporates an autoregressive binary choice component and thus captures the (potentially different) dynamics of both zero occurrences and of strictly positive realizations. Applying the proposed model to high-frequency cumulated trading volumes of liquid NYSE stocks, we show that the model captures both the dynamic and distribution properties of the data very well and is able to correctly predict future distributions.high-frequency data, point-mass mixture, multiplicative error model, excess zeros, semiparametric specification test, market microstructure

    Advanced statistical methods for prognostic biomarkers and disease incidence models

    Get PDF
    Due to their prognostic value, biomarkers can support physicians in making the appropriate choice of therapy for a patient. In this thesis, several advanced statistical methods and machine learning algorithms were considered and applied to projects in collaboration with departments of the University Hospital Augsburg. A machine learning algorithm capturing hidden structures in binary immunohistologically stained images of colon cancer was developed to identify patients with a high risk of occurrence of distant metastases. Further, generalized linear models were used to estimate the probability of the need for a permanent shunt in patients after an aneurysmatic subarachnoid hemorrhage. Patients with oligometastatic colon cancer were stratified by a score developed using approaches from survival analysis to investigate which groups might benefit from surgical removal of metastases with prolonged overall survival. Another important point is the selection of suitable statistical models dependent on the structure of the data. We found that a linear regression may only be suited with a transformation of the response variable in the context of association of a COVID-19 infection with lymphocyte subsets. In addition, modeling the course of daily reported new COVID-19 cases is a relevant task and requires suitable statistical models. We compared non-seasonal and seasonal ARIMA models and examined the performance of different log-linear autoregressive Poisson models. To add more structure and enable theoretical prognosis for the further course depending on nonpharmaceutical interventions, we fitted a Bayesian SEIR model with several change points and set the determined change points in context with the distribution of variants of the virus.Biomarker können Ärzte durch ihren prognostischen Wert bei der Auswahl geeigneter Therapieoptionen unterstĂŒtzen. In dieser Arbeit wurden mehrere fortgeschrittene statistische Methoden sowie Algorithmen des maschinellen Lernens eingefĂŒhrt und in Zusammenarbeit mit verschiedenen Abteilungen des UniversitĂ€tsklinikums Augsburg angewendet. Mit Hilfe eines Algorithmus des maschinellen Lernens, der versteckte Strukturen in binĂ€ren, immunhistologisch gefĂ€rbten Bildern von Darmkrebstumoren feststellen kann, wurden Patienten mit einem hohen Risiko fĂŒr auftretende Fernmetastasen identifiziert. Ebenso wurden Generalisierte Lineare Modelle verwendet, um eine Vorhersage der Wahrscheinlichkeit fĂŒr eine dauerhafte Shunt-Anlegung nach einer aneurysmatischen Subarachnoidalblutung zu treffen. Patienten mit oligometastastischen Darmkrebs wurden mittels eines Scores, der anhand von Methoden der Survival Analysis entwickelt wurde, stratifiziert, um eine Gruppe zu identifizieren, die von einer operativen Entfernung der Metastasen durch ein langes GesamtĂŒberleben profitieren kann. Ein weiterer wichtiger Punkt bei der Datenanalyse ist die geeignete Auswahl der statistischen Methode abhĂ€ngig von der Datenstruktur. Es konnten am Beispiel der Assoziation einer Coronainfektion mit der Anzahl von Lymphozytensubpopulationen gezeigt werden, dass eine Transformation der Zielvariable notwendig sein kann, um die Voraussetzungen der linearen Regression zu erfĂŒllen. Die Modellierung der Anzahl an tĂ€glichen Neuinfektionen stellt eine relevante Aufgabe dar und benötigt passende statistische Modelle. Ein non-seasonal und ein seasonal ARIMA-Model wurden ebenso wie mehrere log-linearen autoregressiven Poisson-Modellen verglichen. ZusĂ€tzlich wurde ein weiterer Modellierungsansatz untersucht, der die biologischen Mechanismen stĂ€rker einbezieht und eine theoretische Prognose fĂŒr den weiteren Verlauf unter verschiedenen Szenarien ermöglicht. Der Verlauf wurde mittels eines bayesschen SEIR Modell mit mehreren Wendepunkten an die Daten angepasst. Die gefundenen Wendepunkte wurden in Kontext der Verteilung der Virusvarianten analysiert

    Estimation for the Prediction of Point Processes with Many Covariates

    Get PDF
    Estimation of the intensity of a point process is considered within a nonparametric framework. The intensity measure is unknown and depends on covariates, possibly many more than the observed number of jumps. Only a single trajectory of the counting process is observed. Interest lies in estimating the intensity conditional on the covariates. The impact of the covariates is modelled by an additive model where each component can be written as a linear combination of possibly unknown functions. The focus is on prediction as opposed to variable screening. Conditions are imposed on the coefficients of this linear combination in order to control the estimation error. The rates of convergence are optimal when the number of active covariates is large. As an application, the intensity of the buy and sell trades of the New Zealand dollar futures is estimated and a test for forecast evaluation is presented. A simulation is included to provide some finite sample intuition on the model and asymptotic properties
    • 

    corecore