1,144 research outputs found
Multiplicative local linear hazard estimation and best one-sided cross-validation
This paper develops detailed mathematical statistical theory of a new class of cross-validation techniques of local linear kernel hazards and their multiplicative bias corrections. The new class of cross-validation combines principles of local information and recent advances in indirect cross-validation. A few applications of cross-validating multiplicative kernel hazard estimation do exist in the literature. However, detailed mathematical statistical theory and small sample performance are introduced via this paper and further upgraded to our new class of best one-sided cross-validation. Best one-sided cross-validation turns out to have excellent performance in its practical illustrations, in its small sample performance and in its mathematical statistical theoretical performance
Recommended from our members
A comparison of in-sample forecasting methods
In-sample forecasting is a recent continuous modification of well-known forecasting methods based on aggregated data. These aggregated methods are known as age-cohort methods in demography, economics, epidemiology and sociology and as chain ladder in non-life insurance. Data is organized in a two-way table with age and cohort as indices, but without measures of exposure. It has recently been established that such structured forecasting methods based on aggregated data can be interpreted as structured histogram estimators. Continuous in-sample forecasting transfers these classical forecasting models into a modern statistical world including smoothing methodology that is more efficient than smoothing via histograms. All in-sample forecasting estimators are collected and their performance is compared via a finite sample simulation study. All methods are extended via multiplicative bias correction. Asymptotic theory is being developed for the histogram-type method of sieves and for the multiplicatively corrected estimators. The multiplicative bias corrected estimators improve all other known in-sample forecasters in the simulation study. The density projection approach seems to have the best performance with forecasting based on survival densities being the runner-up
Recommended from our members
Bandwidth selection in marker dependent kernel hazard estimation
Practical estimation procedures for the local linear estimation of an unrestricted failure rate when more information is available than just time are developed. This extra information could be a covariate and this covariate could be a time series. Time dependent covariates are sometimes called markers, and failure rates are sometimes called hazards, intensities or mortalities. It is shown through simulations and a practical example that the fully local linear estimation procedure exhibits an excellent practical performance. Two different bandwidth selection procedures are developed. One is an adaptation of classical cross-validation, and the other one is indirect cross-validation. The simulation study concludes that classical cross-validation works well on continuous data while indirect cross-validation performs only marginally better. However, cross-validation breaks down in the practical data application to old-age mortality. Indirect cross-validation is thus shown to be superior when selecting a fully feasible estimation method for marker dependent hazard estimation
Recommended from our members
In-Sample Forecasting with Local Linear Survival Densities
In this paper, in-sample forecasting is defined as forecasting a structured density to sets where it is unobserved. The structured density consists of one-dimensional in-sample components that identify the density on such sets. We focus on the multiplicative density structure, which has recently been seen as the underlying structure of non-life insurance forecasts. In non-life insurance the in-sample area is defined as one triangle and the forecasting area as the triangle that 20 added to the first triangle produces a square. Recent approaches estimate two one-dimensional components by projecting an unstructured two-dimensional density estimator onto the space of multiplicatively separable functions. We show that time-reversal reduces the problem to two one-dimensional problems, where the one-dimensional data are left-truncated and a one-dimensional survival density estimator is needed. This paper then uses the local linear density smoother with 25 weighted cross-validated and do-validated bandwidth selectors. Full asymptotic theory is provided, with and without time reversal. Finite sample studies and an application to non-life insurance are included
Recommended from our members
Double one-sided cross-validation of local linear hazards
This paper brings together the theory and practice of local linear kernel hazard estimation. Bandwidth selection is fully analysed, including Do-validation that is shown to have good practical and theoretical properties. Insight is provided into the choice of the weighting function in the local linear minimization and it is pointed out that classical weighting sometimes lacks stability. A new semiparametric hazard estimator transforming the survival data before smoothing is introduced and shown to have good practical properties
Recommended from our members
Smoothing survival densities in practice
Many nonparametric smoothing procedures consider independent identically distributed stochastic variables. There are also many important nonparametric smoothing applications where the data is more complicated. Survival data or filtered data, defined as following Aalenâs multiplicative hazard model and aggregated versions of this model, are considered. Aalenâs model based on counting process theory allows multiple left truncations and multiple right censoring to be present in the data. This type of filtering is omnipresent in biostatistical and demographical applications, where people can join a study, leave the study and perhaps join the study again. The estimation methodology is based on a recent class of local linear density estimators. A new stable bandwidth-selector is developed for these estimators. A data application to aggregated national mortality data is provided, where immigrations to and from the country correspond to respectively left truncation and right censoring. The aggregated mortality data study illustrates that the new practical density estimators provide an important extra element in the visual toolbox for understanding survival data
Recommended from our members
In-sample forecasting: structured models and reserving
In most developed countries, the insurance sector accounts for around eight percent of the GDP. In Europe alone the insurers liabilities are estimated at around e900 billion. Every insurance company regularly estimates its liabilities and reports them, in conjunction with statements about capital and assets, to the regulators. The liabilities determine the insurers solvency and also its pricing and investment strategy. The new EU directive, Solvency II, which came into effect in the beginning of 2016, states that those liabilities should be estimated with ârealistic assumptionâ using ârelevant actuarial and statistical methodsâ. However, modern statistics has not found its way in the reserving departments of todayâs insurance companies. This thesis attempts to contribute to the connection between the world of mathematical statistics and the reserving practice in general insurance. As part of this thesis, it is in particular shown that todayâs reserving practice can be understood as a non-parametric estimation approach in a structured model setting. The forecast of future claims is done without the use of exposure information, i.e., without knowledge about the number of underwritten policies. New statistical estimation techniques and properties are derived which are build from this motivating application
Capturing the Zero: A New Class of Zero-Augmented Distributions and Multiplicative Error Processes
We propose a novel approach to model serially dependent positive-valued variables which realize a non-trivial proportion of zero outcomes. This is a typical phenomenon in financial time series observed on high frequencies, such as cumulated trading volumes or the time between potentially simultaneously occurring market events. We introduce a flexible point-mass mixture distribution and develop a semiparametric specification test explicitly tailored for such distributions. Moreover, we propose a new type of multiplicative error model (MEM) based on a zero-augmented distribution, which incorporates an autoregressive binary choice component and thus captures the (potentially different) dynamics of both zero occurrences and of strictly positive realizations. Applying the proposed model to high-frequency cumulated trading volumes of liquid NYSE stocks, we show that the model captures both the dynamic and distribution properties of the data very well and is able to correctly predict future distributions.high-frequency data, point-mass mixture, multiplicative error model, excess zeros, semiparametric specification test, market microstructure
Advanced statistical methods for prognostic biomarkers and disease incidence models
Due to their prognostic value, biomarkers can support physicians in making the appropriate choice of therapy for a patient. In this thesis, several advanced statistical methods and machine learning algorithms were considered and applied to projects in collaboration with departments of the University Hospital Augsburg. A machine learning algorithm capturing hidden structures in binary
immunohistologically stained images of colon cancer was developed to identify patients with a high risk of occurrence of distant metastases. Further, generalized linear models were used to estimate the probability of the need for a permanent shunt in patients after an aneurysmatic subarachnoid hemorrhage. Patients with oligometastatic colon cancer were stratified by a score developed using approaches from survival analysis to investigate which groups might benefit from surgical removal of metastases with prolonged overall survival.
Another important point is the selection of suitable statistical models dependent on the structure of the data. We found that a linear regression may only be suited with a transformation of the response variable in the context of association of a COVID-19 infection with lymphocyte subsets. In addition, modeling the course of daily reported new COVID-19 cases is a relevant task and requires suitable statistical models. We compared non-seasonal and seasonal ARIMA models and examined the performance of different log-linear autoregressive Poisson models. To add more structure and enable theoretical prognosis for the further course depending on nonpharmaceutical interventions, we fitted a Bayesian SEIR model with several change points and set the determined change points in context with the distribution of variants of the virus.Biomarker können Ărzte durch ihren prognostischen Wert bei der Auswahl geeigneter Therapieoptionen unterstĂŒtzen. In dieser Arbeit wurden mehrere fortgeschrittene statistische Methoden sowie Algorithmen des maschinellen Lernens eingefĂŒhrt und in Zusammenarbeit mit verschiedenen Abteilungen des UniversitĂ€tsklinikums Augsburg angewendet. Mit Hilfe eines Algorithmus des maschinellen Lernens, der versteckte Strukturen in binĂ€ren, immunhistologisch gefĂ€rbten Bildern von Darmkrebstumoren feststellen kann, wurden Patienten mit einem hohen Risiko fĂŒr auftretende Fernmetastasen identifiziert. Ebenso wurden Generalisierte Lineare Modelle verwendet, um eine Vorhersage der Wahrscheinlichkeit fĂŒr eine dauerhafte Shunt-Anlegung nach einer aneurysmatischen Subarachnoidalblutung zu treffen. Patienten mit oligometastastischen Darmkrebs wurden mittels eines Scores, der anhand von Methoden der Survival Analysis entwickelt wurde, stratifiziert, um eine Gruppe zu identifizieren, die von einer operativen Entfernung der Metastasen durch ein langes
GesamtĂŒberleben profitieren kann.
Ein weiterer wichtiger Punkt bei der Datenanalyse ist die geeignete Auswahl der statistischen Methode abhÀngig von der Datenstruktur. Es konnten am Beispiel der Assoziation einer Coronainfektion mit der Anzahl von Lymphozytensubpopulationen
gezeigt werden, dass eine Transformation der Zielvariable notwendig sein kann, um die Voraussetzungen der linearen Regression zu erfĂŒllen. Die Modellierung der Anzahl an tĂ€glichen Neuinfektionen stellt eine relevante Aufgabe dar und benötigt passende statistische Modelle. Ein non-seasonal und ein seasonal ARIMA-Model wurden ebenso wie mehrere log-linearen autoregressiven Poisson-Modellen verglichen. ZusĂ€tzlich wurde ein weiterer Modellierungsansatz untersucht, der die biologischen Mechanismen stĂ€rker einbezieht und eine theoretische Prognose fĂŒr den weiteren Verlauf unter verschiedenen Szenarien ermöglicht. Der Verlauf wurde mittels eines bayesschen SEIR Modell mit mehreren Wendepunkten an die Daten angepasst. Die gefundenen Wendepunkte wurden in Kontext der Verteilung der Virusvarianten analysiert
Estimation for the Prediction of Point Processes with Many Covariates
Estimation of the intensity of a point process is considered within a
nonparametric framework. The intensity measure is unknown and depends on
covariates, possibly many more than the observed number of jumps. Only a single
trajectory of the counting process is observed. Interest lies in estimating the
intensity conditional on the covariates. The impact of the covariates is
modelled by an additive model where each component can be written as a linear
combination of possibly unknown functions. The focus is on prediction as
opposed to variable screening. Conditions are imposed on the coefficients of
this linear combination in order to control the estimation error. The rates of
convergence are optimal when the number of active covariates is large. As an
application, the intensity of the buy and sell trades of the New Zealand dollar
futures is estimated and a test for forecast evaluation is presented. A
simulation is included to provide some finite sample intuition on the model and
asymptotic properties
- âŠ