4 research outputs found

    Algorithms for Estimating Trends in Global Temperature Volatility

    Full text link
    Trends in terrestrial temperature variability are perhaps more relevant for species viability than trends in mean temperature. In this paper, we develop methodology for estimating such trends using multi-resolution climate data from polar orbiting weather satellites. We derive two novel algorithms for computation that are tailored for dense, gridded observations over both space and time. We evaluate our methods with a simulation that mimics these data's features and on a large, publicly available, global temperature dataset with the eventual goal of tracking trends in cloud reflectance temperature variability.Comment: Published in AAAI-1

    Production, efficiency and managerial selection of workers into peer networks

    Get PDF
    This dissertation develops empirical models that account for worker interactions, managerial selectivity, and technical inefficiency in the production process. The first chapter, entitled Stochastic Frontier Models with Network Selectivity, develops a model where workers produce output through peer-effect networks, while managerial selectivity of workers affects worker inefficiency. The intuition behind this model is that managers may consider optimal combinations of workers to produce the best results, and this selectivity in the worker network may affect worker productivity. The second chapter, entitled Network Competition and Team Chemistry in the NBA, models simultaneous interactions between multiple networks where agents cooperate with peers within their own networks but compete with non-peers from other networks. This paper presents the first econometric model to consider multiple peer networks where workers are engaged in simultaneous competition around a single outcome variable. Lastly, the third chapter, entitled Adaptive LASSO for Stochastic Frontier Models with Many Efficient Firms, develops a procedure to select a subset of maximally efficient firms in the sample of interest. In this model, firm inefficiency is measured as a distance from an estimated optimal production level, and I apply the LASSO (Least Absolute Shrinkage and Selection Operator, Tibshirani, 1996) to identify a subset of firms whose inefficiencies are estimated as exactly zero. This methodology can be applied to any classification problem where our interest is to identify a subset of best (worst) individuals among a large number of candidates

    Computational Methods for the Analysis of Complex Data

    Get PDF
    This PhD dissertation bridges the disciplines of Operations Research and Statistics to develop novel computational methods for the extraction of knowledge from complex data. In this research, complex data stands for datasets with many instances and/or variables, with different types of variables, with dependence structures among the variables, collected from different sources (heterogeneous), possibly with non-identical population class sizes, with different misclassification costs, or characterized by extreme instances (heavy-tailed data), among others. Recently, the complexity of the raw data in addition to new requests posed by practitioners (interpretable models, cost-sensitive models or models which are efficient in terms of running times) entail a challenge from a scientific perspective. The main contributions of this PhD dissertation are encompassed in three different research frameworks: Regression, Classification and Bayesian inference. Concerning the first, we consider linear regression models, where a continuous outcome variable is to be predicted by a set of features. On the one hand, seeking for interpretable solutions in heterogeneous datasets, we propose a novel version of the Lasso in which the performance of the method on groups of interest is controlled. On the other hand, we use mathematical optimization tools to propose a sparse linear regression model (that is, a model whose solution only depends on a subset of predictors) specifically designed for datasets with categorical and hierarchical features. Regarding the task of Classification, in this PhD dissertation we have explored in depth the Naïve Bayes classifier. This method has been adapted to obtain a sparse solution and also, it has been modified to deal with cost-sensitive datasets. For both problems, novel strategies for reducing high running times are presented. Finally, the last contribution of this dissertation concerns Bayesian inference methods. In particular, in the setting of heavy-tailed data, we consider a semi-parametric Bayesian approach to estimate the Elliptical distribution. The structure of this dissertation is as follows. Chapter 1 contains the theoretical background needed to develop the following chapters. In particular, two main research areas are reviewed: sparse and cost-sensitive statistical learning and Bayesian Statistics. Chapter 2 proposes a Lasso-based method in which quadratic performance constraints to bound the prediction errors in the individuals of interest are added to Lasso-based objective functions. This constrained sparse regression model is defined by a nonlinear optimization problem. Specifically, it has a direct application in heterogeneous samples where data are collected from distinct sources, as it is standard in many biomedical contexts. Chapter 3 studies linear regression models built on categorical predictor variables that have a hierarchical structure. The model is flexible in the sense that the user decides the level of detail in the information used to build it, having into account data privacy considerations. To trade off the accuracy of the linear regression model and its complexity, a Mixed Integer Convex Quadratic Problem with Linear Constraints is solved. In Chapter 4, a sparse version of the Naïve Bayes classifier, which is characterized by the following three properties, is proposed. On the one hand, the selection of the subset of variables is done in terms of the correlation structure of the predictor variables. On the other hand, such selection can be based on different performance measures. Additionally, performance constraints on groups of higher interest can be included. This smart search integrates the flexibility in terms of performance for classification, yielding competitive running times. The approach introduced in Chapter 2 is also explored in Chapter 5 for improving the performance of the Naïve Bayes classifier in the classes of most interest to the user. Unlike the traditional version of the classifier, which is a two-step classifier (estimation first and classification next), the novel approach integrates both stages. The method is formulated via an optimization problem where the likelihood function is maximized with constraints on the classification rates for the groups of interest. When dealing with datasets of especial characteristics (for example, heavy tails in contexts as Economics and Finance), Bayesian statistical techniques have shown their potential in the literature. In Chapter 6, Elliptical distributions, which are generalizations of the multivariate normal distribution to both longer tails and elliptical contours, are examined, and Bayesian methods to perform semi-parametric inference for them are used. Finally, Chapter 7 closes the thesis with general conclusions and future lines of research
    corecore