986 research outputs found

    Learning Geometric Concepts with Nasty Noise

    Full text link
    We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error O(ϵ)O(\epsilon), where ϵ\epsilon is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

    Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners (Second Edition)

    Get PDF
    Probabilistic Risk Assessment (PRA) is a comprehensive, structured, and logical analysis method aimed at identifying and assessing risks in complex technological systems for the purpose of cost-effectively improving their safety and performance. NASA's objective is to better understand and effectively manage risk, and thus more effectively ensure mission and programmatic success, and to achieve and maintain high safety standards at NASA. NASA intends to use risk assessment in its programs and projects to support optimal management decision making for the improvement of safety and program performance. In addition to using quantitative/probabilistic risk assessment to improve safety and enhance the safety decision process, NASA has incorporated quantitative risk assessment into its system safety assessment process, which until now has relied primarily on a qualitative representation of risk. Also, NASA has recently adopted the Risk-Informed Decision Making (RIDM) process [1-1] as a valuable addition to supplement existing deterministic and experience-based engineering methods and tools. Over the years, NASA has been a leader in most of the technologies it has employed in its programs. One would think that PRA should be no exception. In fact, it would be natural for NASA to be a leader in PRA because, as a technology pioneer, NASA uses risk assessment and management implicitly or explicitly on a daily basis. NASA has probabilistic safety requirements (thresholds and goals) for crew transportation system missions to the International Space Station (ISS) [1-2]. NASA intends to have probabilistic requirements for any new human spaceflight transportation system acquisition. Methods to perform risk and reliability assessment in the early 1960s originated in U.S. aerospace and missile programs. Fault tree analysis (FTA) is an example. It would have been a reasonable extrapolation to expect that NASA would also become the world leader in the application of PRA. That was, however, not to happen. Early in the Apollo program, estimates of the probability for a successful roundtrip human mission to the moon yielded disappointingly low (and suspect) values and NASA became discouraged from further performing quantitative risk analyses until some two decades later when the methods were more refined, rigorous, and repeatable. Instead, NASA decided to rely primarily on the Hazard Analysis (HA) and Failure Modes and Effects Analysis (FMEA) methods for system safety assessment

    Spatio-Temporal Cluster Detection and Local Moran Statistics of Point Processes

    Get PDF
    Moran\u27s index is a statistic that measures spatial dependence, quantifying the degree of dispersion or clustering of point processes and events in some location/area. Recognizing that a single Moran\u27s index may not give a sufficient summary of the spatial autocorrelation measure, a local indicator of spatial association (LISA) has gained popularity. Accordingly, we propose extending LISAs to time after partitioning the area and computing a Moran-type statistic for each subarea. Patterns between the local neighbors are unveiled that would not otherwise be apparent. We consider the measures of Moran statistics while incorporating a time factor under simulated multilevel Palm distribution, a generalized Poisson phenomenon where the clusters and dependence among the subareas are captured by the rate of increase of the process over time. Event propagation is built under spatial nested sequences over time. The Palm parameters, Moran statistics and convergence criteria are calculated from an explicit algorithm in a Markov chain Monte Carlo simulation setting and further analyzed in two real datasets

    Advanced methodologies for reliability-based design optimization and structural health prognostics

    Get PDF
    Failures of engineered systems can lead to significant economic and societal losses. To minimize the losses, reliability must be ensured throughout the system's lifecycle in the presence of manufacturing variability and uncertain operational conditions. Many reliability-based design optimization (RBDO) techniques have been developed to ensure high reliability of engineered system design under manufacturing variability. Schedule-based maintenance, although expensive, has been a popular method to maintain highly reliable engineered systems under uncertain operational conditions. However, so far there is no cost-effective and systematic approach to ensure high reliability of engineered systems throughout their lifecycles while accounting for both the manufacturing variability and uncertain operational conditions. Inspired by an intrinsic ability of systems in ecology, economics, and other fields that is able to proactively adjust their functioning to avoid potential system failures, this dissertation attempts to adaptively manage engineered system reliability during its lifecycle by advancing two essential and co-related research areas: system RBDO and prognostics and health management (PHM). System RBDO ensures high reliability of an engineered system in the early design stage, whereas capitalizing on PHM technology enables the system to proactively avoid failures in its operation stage. Extensive literature reviews in these areas have identified four key research issues: (1) how system failure modes and their interactions can be analyzed in a statistical sense; (2) how limited data for input manufacturing variability can be used for RBDO; (3) how sensor networks can be designed to effectively monitor system health degradation under highly uncertain operational conditions; and (4) how accurate and timely remaining useful lives of systems can be predicted under highly uncertain operational conditions. To properly address these key research issues, this dissertation lays out four research thrusts in the following chapters: Chapter 3 - Complementary Intersection Method for System Reliability Analysis, Chapter 4 - Bayesian Approach to RBDO, Chapter 5 - Sensing Function Design for Structural Health Prognostics, and Chapter 6 - A Generic Framework for Structural Health Prognostics. Multiple engineering case studies are presented to demonstrate the feasibility and effectiveness of the proposed RBDO and PHM techniques for ensuring and improving the reliability of engineered systems within their lifecycles

    Recursive analysis and estimation for the discrete Boolean random set model

    Get PDF
    Random sets provide a powerful class of models for images containing randomly placed objects of random shapes and orientation. Those pixels within the foreground are members of a random set realization. The discrete Boolean model is the simplest general random set model in which a Bernoulli point process (called a germ process) is coupled with an independent shape or grain process. A typical realization consists of many overlapping shapes. Estimation in these models is difficult owing to the fact that many outcomes of the process obscure other outcomes. The directional one-dimensional (ID) model, in which random- length line segments emanate to the right from germs on the line, is analyzed via recursive expressions to provide a complete characterization of these discrete models in terms of the distributions of their black and white runlengths. An analytic representation is given for the optimal windowed filter for the signalunion- noise process, where both signal and noise are Boolean models. Several of these results are extended to the nondirectional case where segments can emanate to the left and right. Sufficient conditions are presented for a two-dimensional (2D) discrete Boolean model to induce a one dimensional Boolean model on an intersecting line. When inducement holds, the likelihood of runlength observations of the two-dimensional model is used to provide maximum-likelihood estimation of parameters of the 2D model. The ID directional discrete Boolean model is equivalent to the discrete-time infinite-server queue. Analysis for the Boolean model is extended to provide densities for many random variables of interest in queueing theory

    Extreme events: dynamics, statistics and prediction

    Get PDF

    데이터사이언스를 위한 확률과 통계

    Get PDF
    이 노트는 본저자가 2020넌 가을학기 서울대학교 데이터사이언스대학원에서 강의한 ‘데이터사이언스를 위한 확률과 통계(Probability and Statistics for Data Science)’ 과목의 강의 슬라이드를 모아서 출간한 것이

    ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

    Get PDF
    B

    Multivariate Simulation-based Forecasting for Intraday Power Markets: Modelling Cross-Product Price Effects

    Full text link
    Intraday electricity markets play an increasingly important role in balancing the intermittent generation of renewable energy resources, which creates a need for accurate probabilistic price forecasts. However, research to date has focused on univariate approaches, while in many European intraday electricity markets all delivery periods are traded in parallel. Thus, the dependency structure between different traded products and the corresponding cross-product effects cannot be ignored. We aim to fill this gap in the literature by using copulas to model the high-dimensional intraday price return vector. We model the marginal distribution as a zero-inflated Johnson's SUS_U distribution with location, scale and shape parameters that depend on market and fundamental data. The dependence structure is modelled using latent beta regression to account for the particular market structure of the intraday electricity market, such as overlapping but independent trading sessions for different delivery days. We allow the dependence parameter to be time-varying. We validate our approach in a simulation study for the German intraday electricity market and find that modelling the dependence structure improves the forecasting performance. Additionally, we shed light on the impact of the single intraday coupling (SIDC) on the trading activity and price distribution and interpret our results in light of the market efficiency hypothesis. The approach is directly applicable to other European electricity markets

    Probability models for information retrieval based on divergence from randomness

    Get PDF
    This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finetti’s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models
    corecore