316 research outputs found
Optimal Subsampling Designs Under Measurement Constraints
We consider the problem of optimal subsample selection in an experiment setting where observing, or utilising, the full dataset for statistical analysis is practically unfeasible. This may be due to, e.g., computational, economic, or even ethical cost-constraints. As a result, statistical analyses must be restricted to a subset of data. Choosing this subset in a manner that captures as much information as possible is essential.In this thesis we present a theory and framework for optimal design in general subsampling problems. The methodology is applicable to a wide range of settings and inference problems, including regression modelling, parametric density estimation, and finite population inference. We discuss the use of auxiliary information and sequential optimal design for the implementation of optimal subsampling methods in practice and study the asymptotic properties of the resulting estimators. The proposed methods are illustrated and evaluated on three problem areas: on subsample selection for optimal prediction in active machine learning (Paper I), optimal control sampling in analysis of safety critical events in naturalistic driving studies (Paper II), and optimal subsampling in a scenario generation context for virtual safety assessment of an advanced driver assistance system (Paper III). In Paper IV we present a unified theory that encompasses and generalises the methods of Paper I–III and introduce a class of expected-distance-minimising designs with good theoretical and practical properties.In Paper I–III we demonstrate a sample size reduction of 10–50% with the proposed methods compared to simple random sampling and traditional importance sampling methods, for the same level of performance. We propose a novel class of invariant linear optimality criteria, which in Paper IV are shown to reach 90–99% D-efficiency with 90–95% lower computational demand
Quantile-based smooth transition value at risk estimation
Value at Risk models are concerned with the estimation of conditional quantiles of a time series. Formally these quantities are a function of conditional volatility and the respective quantile of the innovation distribution. The former is often subject to asymmetric dynamic behaviour, e.g. with respect to past shocks. In this paper we propose a model in which conditional quantiles follow a generalised autoregressive process governed by two parameter regimes with their weights determined by a smooth transition function. We develop a two step estimation procedure based on a sieve estimator, approximating conditional volatility using composite quantile regression, which is then used in the generalised autoregressive conditional quantile estimation. We show the estimator is consistent and asymptotically normal and complement the results with a simulation study. In our empirical application we consider daily returns of the German equity index (DAX) and the USD/GBP exchange rate. While only the latter follows a two regime model, we find that our model performs well in terms of out-of-sample prediction in both cases
Statistical Inference in Quantile Regression Models
The main purpose of this dissertation is to collect different innovative statistical methods in quantile regression. The contributions can be summarized as follows:
-- A new method to construct prediction intervals involving median regression and bootstrapping the prediction error is proposed.
-- A plug-in bandwidth selector for nonparametric quantile regression has been proposed, that is based on nonparametric estimations of the curvature of the quantile regression function and the integrated sparsity.
-- Two lack-of-fit tests for quantile regression models have been presented. The first test is based on the cumulative sum of residuals with respect to unidimensional linear projections of the covariates in order to deal with high-dimensional covariates. The second test is based on interpreting the residuals from the quantile model fit as response values of a logistic regression. Then a likelihood ratio test in the logistic regression is used to check the quantile model
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Recommended from our members
Pricing weekly motor insurance drivers’ with behavioral and contextual telematics data
Telematics boxes integrated into vehicles are instrumental in capturing driving data encompassing behavioral and contextual information, including speed, distance travelled by road type, and time of day. These data can be amalgamated with drivers' individual attributes and reported accident occurrences to their respective insurance providers. Our study analyzes a substantial sample size of 19,214 individual drivers over a span of 55 weeks, covering a cumulative distance of 181.4 million kilometers driven. Utilizing this dataset, we develop predictive models for weekly accident frequency. As anticipated based on prior research with yearly data, our findings affirm that behavioral traits, such as instances of excessive speed, and contextual data pertaining to road type and time of day significantly aid in ratemaking design. The predictive models enable the creation of driving scores and personalized warnings, presenting a potential to enhance traffic safety by alerting drivers to perilous conditions. Our discussion delves into the construction of multiplicative scores derived from Poisson regression, contrasting them with additive scores resulting from a linear probability model approach, which offer greater communicability. Furthermore, we demonstrate that the inclusion of lagged behavioral and contextual factors not only enhances prediction accuracy but also lays the foundation for a diverse range of usage-based insurance schemes for weekly payments
On the predictability of U.S. stock market using machine learning and deep learning techniques
Conventional market theories are considered to be inconsistent approach in modern financial analysis. This thesis focuses mainly on the application of sophisticated machine learning and deep learning techniques in stock market statistical predictability and economic significance over the benchmark conventional efficient market hypothesis and econometric models. Five chapters and three publishable papers were proposed altogether, and each chapter is developed to solve specific identifiable problem(s). Chapter one gives the general introduction of the thesis. It presents the statement of the research problems identified in the relevant literature, the objective of the study and the significance of the study. Chapter two applies a plethora of machine learning techniques to forecast the direction of the U.S. stock market. The notable sophisticated techniques such as regularization, discriminant analysis, classification trees, Bayesian and neural networks were employed. The empirical findings revealed that the discriminant analysis classifiers, classification trees, Bayesian classifiers and penalized binary probit models demonstrate significant outperformance over the binary probit models both statistically and economically, proving significant alternatives to portfolio managers. Chapter three focuses mainly on the application of regression training (RT) techniques to forecast the U.S. equity premium. The RT models demonstrate significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. Chapter four investigates the statistical predictive power and economic significance of financial stock market data by deep learning techniques. Chapter five give the summary, conclusion and present area(s) of further research. The techniques are proven to be robust both statistically and economically when forecasting the equity premium out-of-sample using recursive window method. Overall, the deep learning techniques produced the best result in this thesis. They
seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk
DATA-DRIVEN ANALYTICAL MODELS FOR IDENTIFICATION AND PREDICTION OF OPPORTUNITIES AND THREATS
During the lifecycle of mega engineering projects such as: energy facilities,
infrastructure projects, or data centers, executives in charge should take into account
the potential opportunities and threats that could affect the execution of such projects.
These opportunities and threats can arise from different domains; including for
example: geopolitical, economic or financial, and can have an impact on different
entities, such as, countries, cities or companies. The goal of this research is to provide
a new approach to identify and predict opportunities and threats using large and diverse
data sets, and ensemble Long-Short Term Memory (LSTM) neural network models to
inform domain specific foresights. In addition to predicting the opportunities and
threats, this research proposes new techniques to help decision-makers for deduction
and reasoning purposes. The proposed models and results provide structured output to
inform the executive decision-making process concerning large engineering projects
(LEPs). This research proposes new techniques that not only provide reliable timeseries
predictions but uncertainty quantification to help make more informed decisions.
The proposed ensemble framework consists of the following components: first,
processed domain knowledge is used to extract a set of entity-domain features; second,
structured learning based on Dynamic Time Warping (DTW), to learn similarity
between sequences and Hierarchical Clustering Analysis (HCA), is used to determine
which features are relevant for a given prediction problem; and finally, an automated
decision based on the input and structured learning from the DTW-HCA is used to
build a training data-set which is fed into a deep LSTM neural network for time-series
predictions. A set of deeper ensemble programs are proposed such as Monte Carlo
Simulations and Time Label Assignment to offer a controlled setting for assessing the
impact of external shocks and a temporal alert system, respectively. The developed
model can be used to inform decision makers about the set of opportunities and threats
that their entities and assets face as a result of being engaged in an LEP accounting for
epistemic uncertainty
- …