1,744 research outputs found

    Intraday forecasts of a volatility index: Functional time series methods with dynamic updating

    Full text link
    As a forward-looking measure of future equity market volatility, the VIX index has gained immense popularity in recent years to become a key measure of risk for market analysts and academics. We consider discrete reported intraday VIX tick values as realisations of a collection of curves observed sequentially on equally spaced and dense grids over time and utilise functional data analysis techniques to produce one-day-ahead forecasts of these curves. The proposed method facilitates the investigation of dynamic changes in the index over very short time intervals as showcased using the 15-second high-frequency VIX index values. With the help of dynamic updating techniques, our point and interval forecasts are shown to enjoy improved accuracy over conventional time series models.Comment: 29 pages, 5 figures, To appear at the Annals of Operations Researc

    Robust procedures in chemometrics

    Get PDF

    Low-Default Portfolio/One-Class Classification: A Literature Review

    Get PDF
    Consider a bank which wishes to decide whether a credit applicant will obtain credit or not. The bank has to assess if the applicant will be able to redeem the credit. This is done by estimating the probability that the applicant will default prior to the maturity of the credit. To estimate this probability of default it is first necessary to identify criteria which separate the good from the bad creditors, such as loan amount and age or factors concerning the income of the applicant. The question then arises of how a bank identifies a sufficient number of selective criteria that possess the necessary discriminatory power. As a solution, many traditional binary classification methods have been proposed with varying degrees of success. However, a particular problem with credit scoring is that defaults are only observed for a small subsample of applicants. An imbalance exists between the ratio of non-defaulters to defaulters. This has an adverse effect on the aforementioned binary classification method. Recently one-class classification approaches have been proposed to address the imbalance problem. The purpose of this literature review is three fold: (I) present the reader with an overview of credit scoring; (ii) review existing binary classification approaches; and (iii) introduce and examine one-class classification approaches

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Towards Probabilistic and Partially-Supervised Structural Health Monitoring

    Get PDF
    One of the most significant challenges for signal processing in data-based structural health monitoring (SHM) is a lack of comprehensive data; in particular, recording labels to describe what each of the measured signals represent. For example, consider an offshore wind-turbine, monitored by an SHM strategy. It is infeasible to artificially damage such a high-value asset to collect signals that might relate to the damaged structure in situ; additionally, signals that correspond to abnormal wave-loading, or unusually low-temperatures, could take several years to be recorded. Regular inspections of the turbine in operation, to describe (and label) what measured data represent, would also prove impracticable -- conventionally, it is only possible to check various components (such as the turbine blades) following manual inspection; this involves travelling to a remote, offshore location, which is a high-cost procedure. Therefore, the collection of labelled data is generally limited by some expense incurred when investigating the signals; this might include direct costs, or loss of income due to down-time. Conventionally, incomplete label information forces a dependence on unsupervised machine learning, limiting SHM strategies to damage (i.e. novelty) detection. However, while comprehensive and fully labelled data can be rare, it is often possible to provide labels for a limited subset of data, given a label budget. In this scenario, partially-supervised machine learning should become relevant. The associated algorithms offer an alternative approach to monitor measured data, as they can utilise both labelled and unlabelled signals, within a unifying training scheme. In consequence, this work introduces (and adapts) partially-supervised algorithms for SHM; specifically, semi-supervised and active learning methods. Through applications to experimental data, semi-supervised learning is shown to utilise information in the unlabelled signals, alongside a limited set of labelled data, to further update a predictive-model. On the other hand, active learning improves the predictive performance by querying specific signals to investigate, which are assumed the most informative. Both discriminative and generative methods are investigated, leading towards a novel, probabilistic framework, to classify, investigate, and label signals for online SHM. The findings indicate that, through partially-supervised learning, the cost associated with labelling data can be managed, as the information in a selected subset of labelled signals can be combined with larger sets of unlabelled data -- increasing the potential scope and predictive performance for data-driven SHM

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Data-driven modelling, forecasting and uncertainty analysis of disaggregated demands and wind farm power outputs

    Get PDF
    Correct analysis of modern power supply systems requires to evaluate much wider ranges of uncertainties introduced by the implementation of new technologies on both supply and demand sides. On the supply side, these uncertainties are due to the increased contributions of renewable generation sources (e.g., wind and PV), whose stochastic output variations are difficult to predict and control, as well as due to the significant changes in system operating conditions, coming from the implementation of various control and balancing actions, increased automation and switching functionalities, and frequent network reconfiguration. On the demand side, these uncertainties are due to the installation of new types of loads, featuring strong spatio-temporal variations of demands (e.g., EV charging), as well as due to the deployment of different demand-side management schemes. Modern power supply systems are also characterised by much higher availability of measurements and recordings, coming from a number of recently deployed advanced monitoring, data acquisition and control systems, and providing valuable information on system operating and loading conditions, state and status of network components and details on various system events, transients and disturbances. Although the processing of large amounts of measured data brings its own challenges (e.g., data quality, performance, and incorporation of domain knowledge), these data open new opportunities for a more accurate and comprehensive evaluation of the overall system performance, which, however, require new data-driven analytical approaches and modelling tools. This PhD research is aimed at developing and evaluating novel and improved data-driven methodologies for modelling renewable generation and demand, in general, and for assessing the corresponding uncertainties and forecasting, in particular. The research and methods developed in this thesis use actual field measurements of several onshore and offshore wind farms, as well as measured active and reactive power demands at several low voltage (LV) individual household levels, up to the demands at medium voltage (MV) substation level. The models are specifically built to be implemented for power system analysis and are actually used by a number of researchers and PhD students in Edinburgh and elsewhere (e.g., collaborations with colleagues from Italy and Croatia), which is discussed and illustrated in the thesis through the selected study cases taken from this joint research efforts. After literature review and discussion of basic concepts and definitions, the first part of the thesis presents data-driven analysis, modelling, uncertainty evaluation and forecasting of (predominantly residential) demands and load profiles at LV and MV levels. The analysis includes both aggregation and disaggregation of measured demands, where the latter is considered in the context of identifying demand-manageable loads (e.g., heating). For that purpose, periodical changes in demands, e.g., half-daily, daily, weekly, seasonal and annual, are represented with Fourier/frequency components and correlated with the corresponding exploratory meteorological variables (e.g., temperature, solar irradiance), allowing to select the combination of components maximising the positive or negative correlations as an additional predictor variable. Convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) are then used to represent dependencies among multiple dimensions and to output the estimated disaggregated time series of specific load types (with Bayesian optimisation applied to select appropriate CNN-BiLSTM hyperparameters). In terms of load forecasting, both tree-based and neural network-based models are analysed and compared for the day-ahead and week-ahead forecasting of demands at MV substation level, which are also correlated with meteorological data. Importantly, the presented load forecasting methodologies allow, for the first time, to forecast both total/aggregate demands and corresponding disaggregated demands of specific load types. In terms of the supply side analysis, the thesis presents data-driven evaluation, modelling, uncertainty evaluation and forecasting of wind-based electricity generation systems. The available measurements from both the individual wind turbines (WTs) and the whole wind farms (WFs) are used to formulate simple yet accurate operational models of WTs and WFs. First, available measurements are preprocessed, to remove outliers, as otherwise obtained WT/WF models may be biased, or even inaccurate. A novel simulation-based approach that builds on a procedure recommended in a standard is presented for processing all outliers due to applied averaging window (typically 10 minutes) and WT hysteresis effects (around the cut-in and cut-out wind speeds). Afterwards, the importance of distinguishing between WT-level and WF-level analysis is discussed and a new six-parameter power curve model is introduced for accurate modelling of both cut-in and cut-out regions and for taking into account operating regimes of a WF (WTs in normal/curtailed operation, or outage/fault). The modelling framework in the thesis starts with deterministic models (e.g., CNN-BiLSTM and power curve models) and is then extended to include probabilistic models, building on the Bayesian inference and Copula theory. In that context, the thesis presents a set of innovative data-driven WT and WF probabilistic models, which can accurately model cross-correlations between the WT/WF power output (Pout), wind speed (WS), air density (AD) and wind direction (WD). Vine Copula and Gaussian mixture Copula model (GMCM) are combined, for the first time, to evaluate the uncertainty of Pout values, conditioning on other explanatory variables (which may be either deterministic, or also uncertain). In terms of probabilistic wind energy forecasting, Bayesian CNN-BiLSTM model is used to analyse and efficiently handle high dimensionality of both input meteorological variables (WS, AD and WD) and additional uncertainties due to WF operating regimes. The presented results demonstrate that the developed Vine-GMCM and operational WF model can accurately integrate and effectively correlate all propagated uncertainties, ultimately resulting in much higher confidence levels of the forecasted WF power outputs than in the existing literature

    Increasing the robustness of autonomous systems to hardware degradation using machine learning

    Get PDF
    Autonomous systems perform predetermined tasks (missions) with minimum supervision. In most applications, the state of the world changes with time. Sensors are employed to measure part or whole of the world’s state. However, sensors often fail amidst operation; feeding as such decision-making with wrong information about the world. Moreover, hardware degradation may alter dynamic behaviour, and subsequently the capabilities, of an autonomous system; rendering the original mission infeasible. This thesis applies machine learning to yield powerful and robust tools that can facilitate autonomy in modern systems. Incremental kernel regression is used for dynamic modelling. Algorithms of this sort are easy to train and are highly adaptive. Adaptivity allows for model adjustments, whenever the environment of operation changes. Bayesian reasoning provides a rigorous framework for addressing uncertainty. Moreover, using Bayesian Networks, complex inference regarding hardware degradation can be answered. Specifically, adaptive modelling is combined with Bayesian reasoning to yield recursive estimation algorithms that are robust to sensor failures. Two solutions are presented by extending existing recursive estimation algorithms from the robotics literature. The algorithms are deployed on an underwater vehicle and the performance is assessed in real-world experiments. A comparison against standard filters is also provided. Next, the previous algorithms are extended to consider sensor and actuator failures jointly. An algorithm that can detect thruster failures in an Autonomous Underwater Vehicle has been developed. Moreover, the algorithm adapts the dynamic model online to compensate for the detected fault. The performance of this algorithm was also tested in a real-world application. One step further than hardware fault detection, prognostics predict how much longer can a particular hardware component operate normally. Ubiquitous sensors in modern systems render data-driven prognostics a viable solution. However, training is based on skewed datasets; datasets where the samples from the faulty region of operation are much fewer than the ones from the healthy region of operation. This thesis presents a prognostic algorithm that tackles the problem of imbalanced (skewed) datasets
    corecore