42 research outputs found

    Statistics, econometrics, data, analysis

    Get PDF
    These lecture notes are for Economics PhD students at the Corvinus University of Budapest, but can be used equally by any graduate student interested in modern econometrics and its relationship to general statistics. It is divided into five main sections. The first introduces some general concepts of theoretical statistics, including Bayesian ideas. Many of these ideas appear in econometrics textbooks, but some of them is ominously missing. Basic (philosophical) questions of statistics are usually not treated in those, though some appreciation of them should be useful for any practicing econometrician. The next two sections cover material that can be found in most (non-time series) econometrc textbooks. Here I stress the di¤erence between two apparoaches: the data description style of classical regression analysis and the causal estimation centered econometric approach. The following section introduces statistical learning, an area little known for most economists at the present. My conviction is that its knowledge will be more and more crucial in the future. The final section is assigned to time series analysis, mostly dealing with traditional time domain approaches, but making an unusual, for econometric texts, foray into the frequency domain and wavelet methods. Again, I believe that the latter will be important in the future, and the former is a stepping stone to the latter. The bookis a textbook and as such a compendium. It does not contain new material, at most a few examples or cases. It is based mostly on other textbooks, but selected from a rather wide range. At the end of each section those texts I used most extensively are listed, in all areas these should be consulted if someone wants to have a deeper understanding of the issues involved. The present text aims at a wide coverage, rather than an in-depth one

    Model-based Behavioural Tracking and Scale Invariant Features in Omnidirectional Matching

    Get PDF
    Two classical but crucial and unsolved problems in Computer Vision are treated in this thesis: tracking and matching. The first part of the thesis deals with tracking, studying two of its main difficulties: object representation model drift and total occlusions. The second part considers the problem of point matching between omnidirectional images and between omnidirectional and planar images. Model drift is a major problem of tracking when the object representation model is updated on-line. In this thesis, we have developed a visual tracking algorithm that simultaneously tracks and builds a model of the tracked object. The model is computed using an incremental PCA algorithm that allows to weight samples. Thus, model drift is avoided by weighting samples added to the model according to a measure of confidence on the tracked patch. Furthermore, we have introduced also spatial weights for weighting pixels and increasing tracking accuracy in some regions of the tracked object. Total occlusions are another major problem in visual tracking. Indeed, a total occlusion hides completely the tracked object, making visual information unavailable for tracking. For handling this kind of situations, common in unconstrained scenarios, the Model cOrruption and Total Occlusion Handling (MOTOH) framework is introduced. In this framework, in addition to the model drift avoidance scheme described above, a total occlusion detection procedure is introduced. When a total occlusion is detected, the tracker switches to behavioural-based tracking, where instead of guiding the tracker with visual information, a behavioural model of motion is employed. Finally, a Scale Invariant Feature Transform (SIFT) for omnidirectional images is developed. The proposed algorithm generates two types of local descriptors, Local Spherical Descriptors and Local Planar Descriptors. With the first ones, point matching between omnidirectional images can be performed, and with the second ones, the same matching process can be done but between omnidirectional and planar images. Furthermore, a planar to spherical mapping is introduced and an algorithm for its estimation is given. This mapping allows to extract objects from an omnidirectional image given their SIFT descriptors in a planar image

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Non Stationarity and Market Structure Dynamics in Financial Time Series

    Get PDF
    This thesis is an investigation of the time changing nature of financial markets. Financial markets are complex systems having an intrinsic structure defined by the interplay of several variables. The technological advancements of the ’digital age’ have exponentially increased the amount of data available to financial researchers and industry professionals over the last decade and, as a consequence, it has highlighted the key role of iterations amongst variables. A critical characteristic of the financial system, however, is its time changing nature: the multivariate structure of the systems changes and evolves through time. This feature is critically relevant for classical statistical assumptions and has proven challenging to be investigated and researched. This thesis is devoted to the investigation of this property, providing evidences on the time changing nature of the system, analysing the implications for traditional asset allocation practices and proposing a novel methodology to identify and predict ‘market states’. First, I analyse how classical model estimations are affected by time and what are the consequential effects on classical portfolio construction techniques. Focusing on elliptical models of daily returns, I present experiments on both in-sample and out-of-sample likelihood of individual observations and show that the system changes significantly through time. Larger estimation windows lead to stable likelihood in the long run, but at the cost of lower likelihood in the short-term. A key implication of these findings is that the optimality of fit in finance needs to be defined in terms of the holding period. In this context, I also show that sparse models and information filtering significantly cope with the effects of non stationarity avoiding the typical pitfalls of conventional portfolio optimization approaches. Having assessed and documented the time changing nature of the financial system, I propose a novel methodology to segment financial time series into market states that we call ICC - Inverse Covariance Clustering. The ICC methodology allows to study the evolution of the multivariate structure of the system by segmenting the time series based on their correlation structure. In the ICC framework, market states are identified by a reference sparse precision matrix and a vector of expectation values. In the estimation procedure, each multivariate observation is associated to a market state accordingly to a minimisation of a penalized distance measure (e.g. likelihood, mahalanobis distance). The procedure is made computationally very efficient and can be used with a large number of assets. Furthermore, the ICC methodology allows to control for temporal consistency,S making it of high practical relevance for trading systems. I present a set of experiments investigating the features of the discovered clusters and comparing it to standard clustering techniques. I show that the ICC methodology is successful at clustering different states of the markets in an unsupervised manner, outperforming baseline standard models. Further, I show that the procedure can be efficiently used to forecast off-sample future market states with significant prediction accuracy. Lastly, I test the significance of increasing number of states used to model equity returns and how this parameter relates to the number of observations and the time consistency of the states. I present experiments to investigate a) the likelihood of the overall model as more states are spanned, b) the relevance of additional regimes measured by the number of observations clustered. I found that the number of “market states” that optimally define the system is increasing with the time spanned and the number of observations considered

    Machine Learning Approaches for Natural Resource Data

    Get PDF
    Abstract Real life applications involving efficient management of natural resources are dependent on accurate geographical information. This information is usually obtained by manual on-site data collection, via automatic remote sensing methods, or by the mixture of the two. Natural resource management, besides accurate data collection, also requires detailed analysis of this data, which in the era of data flood can be a cumbersome process. With the rising trend in both computational power and storage capacity, together with lowering hardware prices, data-driven decision analysis has an ever greater role. In this thesis, we examine the predictability of terrain trafficability conditions and forest attributes by using a machine learning approach with geographic information system data. Quantitative measures on the prediction performance of terrain conditions using natural resource data sets are given through five distinct research areas located around Finland. Furthermore, the estimation capability of key forest attributes is inspected with a multitude of modeling and feature selection techniques. The research results provide empirical evidence on whether the used natural resource data is sufficiently accurate enough for practical applications, or if further refinement on the data is needed. The results are important especially to forest industry since even slight improvements to the natural resource data sets utilized in practice can result in high saves in terms of operation time and costs. Model evaluation is also addressed in this thesis by proposing a novel method for estimating the prediction performance of spatial models. Classical model goodness of fit measures usually rely on the assumption of independently and identically distributed data samples, a characteristic which normally is not true in the case of spatial data sets. Spatio-temporal data sets contain an intrinsic property called spatial autocorrelation, which is partly responsible for breaking these assumptions. The proposed cross validation based evaluation method provides model performance estimation where optimistic bias due to spatial autocorrelation is decreased by partitioning the data sets in a suitable way. Keywords: Open natural resource data, machine learning, model evaluationTiivistelmä Käytännön sovellukset, joihin sisältyy luonnonvarojen hallintaa ovat riippuvaisia tarkasta paikkatietoaineistosta. Tämä paikkatietoaineisto kerätään usein manuaalisesti paikan päällä, automaattisilla kaukokartoitusmenetelmillä tai kahden edellisen yhdistelmällä. Luonnonvarojen hallinta vaatii tarkan aineiston keräämisen lisäksi myös sen yksityiskohtaisen analysoinnin, joka tietotulvan aikakautena voi olla vaativa prosessi. Nousevan laskentatehon, tallennustilan sekä alenevien laitteistohintojen myötä datapohjainen päätöksenteko on yhä suuremmassa roolissa. Tämä väitöskirja tutkii maaston kuljettavuuden ja metsäpiirteiden ennustettavuutta käyttäen koneoppimismenetelmiä paikkatietoaineistojen kanssa. Maaston kuljettavuuden ennustamista mitataan kvantitatiivisesti käyttäen kaukokartoitusaineistoa viideltä eri tutkimusalueelta ympäri Suomea. Tarkastelemme lisäksi tärkeimpien metsäpiirteiden ennustettavuutta monilla eri mallintamistekniikoilla ja piirteiden valinnalla. Väitöstyön tulokset tarjoavat empiiristä todistusaineistoa siitä, onko käytetty luonnonvaraaineisto riittävän laadukas käytettäväksi käytännön sovelluksissa vai ei. Tutkimustulokset ovat tärkeitä erityisesti metsäteollisuudelle, koska pienetkin parannukset luonnonvara-aineistoihin käytännön sovelluksissa voivat johtaa suuriin säästöihin niin operaatioiden ajankäyttöön kuin kuluihin. Tässä työssä otetaan kantaa myös mallin evaluointiin esittämällä uuden menetelmän spatiaalisten mallien ennustuskyvyn estimointiin. Klassiset mallinvalintakriteerit nojaavat yleensä riippumattomien ja identtisesti jakautuneiden datanäytteiden oletukseen, joka ei useimmiten pidä paikkaansa spatiaalisilla datajoukoilla. Spatio-temporaaliset datajoukot sisältävät luontaisen ominaisuuden, jota kutsutaan spatiaaliseksi autokorrelaatioksi. Tämä ominaisuus on osittain vastuussa näiden oletusten rikkomisesta. Esitetty ristiinvalidointiin perustuva evaluointimenetelmä tarjoaa mallin ennustuskyvyn mitan, missä spatiaalisen autokorrelaation vaikutusta vähennetään jakamalla datajoukot sopivalla tavalla. Avainsanat: Avoin luonnonvara-aineisto, koneoppiminen, mallin evaluoint

    Development of a Multi-Hour Ahead Wind Power Forecasting System

    Get PDF
    Wind energy, as a renewable and green energy source with substantial value that is vital for sustainable human development, is gaining more and more attention around the world. The variability of wind implies that wind power is random, intermittent, and volatile. In order to overcome the unfavourable factors brought by wind power and enhance the reliable, stable, and secure operation of electrical grids that incorporate wind power systems, a multi-hour ahead wind power forecasting system consisting of an optimal combination of statistical, physical, and artificial intelligence (AI) models for real wind farm applications was proposed in this research. Except for a direct persistence model that was able to produce wind power forecasts directly, an indirect persistence, an autoregressive integrated moving average (ARIMA), and a Weather Research and Forecasting (WRF) model were used to provide wind speed forecasts which, in turn, could be converted to wind power forecasts by using a power curve model. A technique for order of preference by similarity to ideal solution (TOPSIS) scheme was applied to construct a novel 5-in-1 (ensemble) WRF model for wind speed and wind power forecasting. An adaptive neuro-fuzzy inference system (ANFIS) model was employed to determine the power curve model, and another ANFIS model was utilised to build a wind speed correction model exclusively for correcting the wind speed forecasts provided by the 5-in-1 (ensemble) WRF model. By using a set of 24-day historical wind speed and wind power measurements acquired from an operational wind turbine in a real wind farm located in North China, the multi-hour ahead wind power forecasting system was proposed comprising the following components over various forecast time horizons: the direct and indirect persistence models for 30-minute ahead forecasting, the ARIMA model for 1-hour ahead forecasting, and the WRF-TOPSIS model (with corrections obtained from the ANFIS-based wind speed correction model) for 1.5-hour to 24-hour (with a 30-minute temporal resolution) ahead forecasting. The primary contribution of this research is the novel WRF-TOPSIS model strategy used to select and combine the best-performing WRF models from a vast ensemble of possible models. The results demonstrated that the proposed multi-hour ahead wind power forecasting system has excellent predictive performance and is of practical relevance
    corecore