1,178 research outputs found

    Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability

    Full text link
    We study a statistical method to estimate the optimal value, and the optimality gap of a given solution for stochastic optimization as an assessment of the solution quality. Our approach is based on bootstrap aggregating, or bagging, resampled sample average approximation (SAA). We show how this approach leads to valid statistical confidence bounds for non-smooth optimization. We also demonstrate its statistical efficiency and stability that are especially desirable in limited-data situations, and compare these properties with some existing methods. We present our theory that views SAA as a kernel in an infinite-order symmetric statistic, which can be approximated via bagging. We substantiate our theoretical findings with numerical results

    Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability

    Get PDF
    Internet-of-Things (IoT) envisions an intelligent infrastructure of networked smart devices offering task-specific monitoring and control services. The unique features of IoT include extreme heterogeneity, massive number of devices, and unpredictable dynamics partially due to human interaction. These call for foundational innovations in network design and management. Ideally, it should allow efficient adaptation to changing environments, and low-cost implementation scalable to massive number of devices, subject to stringent latency constraints. To this end, the overarching goal of this paper is to outline a unified framework for online learning and management policies in IoT through joint advances in communication, networking, learning, and optimization. From the network architecture vantage point, the unified framework leverages a promising fog architecture that enables smart devices to have proximity access to cloud functionalities at the network edge, along the cloud-to-things continuum. From the algorithmic perspective, key innovations target online approaches adaptive to different degrees of nonstationarity in IoT dynamics, and their scalable model-free implementation under limited feedback that motivates blind or bandit approaches. The proposed framework aspires to offer a stepping stone that leads to systematic designs and analysis of task-specific learning and management schemes for IoT, along with a host of new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive and Scalable Communication Network

    Approaches for Outlier Detection in Sparse High-Dimensional Regression Models

    Get PDF
    Modern regression studies often encompass a very large number of potential predictors, possibly larger than the sample size, and sometimes growing with the sample size itself. This increases the chances that a substantial portion of the predictors is redundant, as well as the risk of data contamination. Tackling these problems is of utmost importance to facilitate scientific discoveries, since model estimates are highly sensitive both to the choice of predictors and to the presence of outliers. In this thesis, we contribute to this area considering the problem of robust model selection in a variety of settings, where outliers may arise both in the response and the predictors. Our proposals simplify model interpretation, guarantee predictive performance, and allow us to study and control the influence of outlying cases on the fit. First, we consider the co-occurrence of multiple mean-shift and variance-inflation outliers in low-dimensional linear models. We rely on robust estimation techniques to identify outliers of each type, exclude mean-shift outliers, and use restricted maximum likelihood estimation to down-weight and accommodate variance-inflation outliers into the model fit. Second, we extend our setting to high-dimensional linear models. We show that mean-shift and variance-inflation outliers can be modeled as additional fixed and random components, respectively, and evaluated independently. Specifically, we perform feature selection and mean-shift outlier detection through a robust class of nonconcave penalization methods, and variance-inflation outlier detection through the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination – which allows the number of features to exponentially increase with the sample size – and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Third, focusing on high-dimensional linear models affected by meanshift outliers, we develop a general framework in which L0-constraints coupled with mixed-integer programming techniques are used to perform simultaneous feature selection and outlier detection with provably optimal guarantees. In particular, we provide necessary and sufficient conditions for a robustly strong oracle property, where again the number of features can increase exponentially with the sample size, and prove optimality for parameter estimation and the resulting breakdown point. Finally, we consider generalized linear models and rely on logistic slippage to perform outlier detection and removal in binary classification. Here we use L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem of feature selection and outlier detection, and the framework allows us again to pursue optimality guarantees. For all the proposed approaches, we also provide computationally lean heuristic algorithms, tuning procedures, and diagnostic tools which help to guide the analysis. We consider several real-world applications, including the study of the relationships between childhood obesity and the human microbiome, and of the main drivers of honey bee loss. All methods developed and data used, as well as the source code to replicate our analyses, are publicly available

    Genomic Clinical Trials and Predictive Medicine by Richard M. Simon

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/109589/1/insr12085_3.pd

    Proximal Point Imitation Learning

    Full text link
    This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation

    Problem-driven scenario generation for stochastic programs

    Get PDF
    Stochastic programming concerns mathematical programming in the presence of uncertainty. In a stochastic program uncertain parameters are modeled as random vectors and one aims to minimize the expectation, or some risk measure, of a loss function. However, stochastic programs are computationally intractable when the underlying uncertain parameters are modeled by continuous random vectors. Scenario generation is the construction of a finite discrete random vector to use within a stochastic program. Scenario generation can consist of the discretization of a parametric probabilistic model, or the direct construction of a discrete distribution. There is typically a trade-off here in the number of scenarios that are used: one must use enough to represent the uncertainty faithfully but not so many that the resultant problem is computationally intractable. Standard scenario generation methods are distribution-based, that is they do not take into account the underlying problem when constructing the discrete distribution. In this thesis we promote the idea of problem-based scenario generation. By taking into account the structure of the underlying problem one may be able to represent uncertainty in a more parsimonious way. The first two papers of this thesis focus on scenario generation for problems which use a tail-risk measure, such as the conditional value-at-risk, focusing in particular on portfolio selection problems. In the final paper we present a constraint driven approach to scenario generation for simple recourse problems, a class of stochastic programs for minimizing the expected shortfall and surplus of some resources with respect to uncertain demands

    Statistical Methodologies

    Get PDF
    Statistical practices have recently been questioned by numerous independent authors, to the extent that a significant fraction of accepted research findings can be questioned. This suggests that statistical methodologies may have gone too far into an engineering practice, with minimal concern for their foundation, interpretation, assumptions, and limitations, which may be jeopardized in the current context. Disguised by overwhelming data sets, advanced processing, and stunning presentations, the basic approach is often intractable to anyone but the analyst. The hierarchical nature of statistical inference, exemplified by Bayesian aggregation of prior and derived knowledge, may also be challenging. Conceptual simplified studies of the kind presented in this book could therefore provide valuable guidance when developing statistical methodologies, but also applying state of the art with greater confidence

    Prescriptive Analytics in Electricity Markets

    Get PDF
    Electricity markets are a clear example of a sector in which decision making plays a crucial role in its daily activity. Moreover, uncertainty is intrinsic to electricity markets and affects most of the tasks that agents operating in them must carry out. Many of these tasks involve decisions characterized by low risk and being addressed periodically. In this thesis, we refer to these tasks as iterative decisions. This thesis applies the aforementioned innovative frameworks for decision making under uncertainty using contextual information in iterative decision making tasks faced daily by electricity market agents.Decision making is critical for any business to survive in a market environment. Examples of decision making tasks are inventory management, resource allocation or portfolio selection. Optimization, understood as the scientific discipline that studies how to solve mathematical programming problems, can help make more efficient decisions in many of these situations. Particularly relevant, because of their frequency and difficulty, are those decisions affected by uncertainty, i.e., in which some of the parameters that precisely determine the optimization problem are unknown when the decision must be made. Fortunately, the development of information technologies has led to an explosion in the availability of data that can be used to assist decisions affected by uncertainty. However, most of the available historical data do not correspond to the unknown parameter of the problem but originate from other related sources. This subset of data, potentially valuable for obtaining better decisions, is called contextual information. This thesis is framed within a new scientific effort that seeks to exploit the potential of data and, in particular, of contextual information in decision making. To this end, in this thesis, we have developed mathematical frameworks and data-driven optimization models that exploit contextual information to make better decisions in problems characterized by the presence of uncertain parameters
    • …
    corecore