2,233 research outputs found
Estimating Discrete Markov Models From Various Incomplete Data Schemes
The parameters of a discrete stationary Markov model are transition
probabilities between states. Traditionally, data consist in sequences of
observed states for a given number of individuals over the whole observation
period. In such a case, the estimation of transition probabilities is
straightforwardly made by counting one-step moves from a given state to
another. In many real-life problems, however, the inference is much more
difficult as state sequences are not fully observed, namely the state of each
individual is known only for some given values of the time variable. A review
of the problem is given, focusing on Monte Carlo Markov Chain (MCMC) algorithms
to perform Bayesian inference and evaluate posterior distributions of the
transition probabilities in this missing-data framework. Leaning on the
dependence between the rows of the transition matrix, an adaptive MCMC
mechanism accelerating the classical Metropolis-Hastings algorithm is then
proposed and empirically studied.Comment: 26 pages - preprint accepted in 20th February 2012 for publication in
Computational Statistics and Data Analysis (please cite the journal's paper
The most frequent N-k line outages occur in motifs that can improve contingency selection
Multiple line outages that occur together show a variety of spatial patterns
in the power transmission network. Some of these spatial patterns form network
contingency motifs, which we define as the patterns of multiple outages that
occur much more frequently than multiple outages chosen randomly from the
network. We show that choosing N-k contingencies from these commonly occurring
contingency motifs accounts for most of the probability of multiple initiating
line outages. This result is demonstrated using historical outage data for two
transmission systems. It enables N-k contingency lists that are much more
efficient in accounting for the likely multiple initiating outages than
exhaustive listing or random selection. The N-k contingency lists constructed
from motifs can improve risk estimation in cascading outage simulations and
help to confirm utility contingency selection
Assessing levels of reliability for design criteria for hurricane and storm damage risk reduction structures
In the wake of Hurricane Katrina, the U.S. Army Corps of Engineers (USACE) updated design methodologies and required factors of safety for hurricane and storm damage risk reduction system (HSDRRS) structures to incorporate lessons-learned from the system performance during Katrina and results of state-of-the-art research in storm surge modeling and foundation behavior. However, the criteria (USACE 2008) were not calibrated to a target reliability, which creates the need to understand the reliability provided by designs using those criteria, especially for pile-founded structures subject to global instability. This dissertation presents a methodology for quantifying the reliability of pile-founded structures that can be applied to hurricane risk reduction structures or more broadly to other types of pile-founded structures. The emphasis of this study is on a representative hurricane risk reduction structure designed using the new USACE criteria, for which the reliability is quantified for comparison to industry target reliabilities. A designer-friendly methodology for quantifying the reliability of hurricane risk reduction structures is presented, along with recommendations developed from a state-of-the-art review of geotechnical, hydraulic, and structural uncertainty data. This methodology utilizes commercial software and routine design methods for the development of inputs into an overarching framework that includes point estimate simulation models and event tree methods to quantify the structure’s system reliability. The methodology is used to illustrate differences in analysis results with and without accounting for variance reductions due to spatial correlation are also presented through stability and flowthrough limit states. Element reliabilities and overarching “system” reliabilities for a representative structure are quantified for hydrostatic hurricane storm surge loadings, soil loading, and dead loads. Wave loadings and impact loadings are not considered. The use of variance reductions on undrained shear strengths for point estimate simulations produced higher system reliability indices than the simulations not considering variance reductions for the stability and flowthrough limit states. Using the reduced variances, computed element and system reliabilities were above the industry target reliability indices presented in the literature
Uniformisation techniques for stochastic simulation of chemical reaction networks
This work considers the method of uniformisation for continuous-time Markov
chains in the context of chemical reaction networks. Previous work in the
literature has shown that uniformisation can be beneficial in the context of
time-inhomogeneous models, such as chemical reaction networks incorporating
extrinsic noise. This paper lays focus on the understanding of uniformisation
from the viewpoint of sample paths of chemical reaction networks. In
particular, an efficient pathwise stochastic simulation algorithm for
time-homogeneous models is presented which is complexity-wise equal to
Gillespie's direct method. This new approach therefore enlarges the class of
problems for which the uniformisation approach forms a computationally
attractive choice. Furthermore, as a new application of the uniformisation
method, we provide a novel variance reduction method for (raw) moment
estimators of chemical reaction networks based upon the combination of
stratification and uniformisation
Approximate Data Analytics Systems
Today, most modern online services make use of big data analytics systems to extract useful information from the raw digital data. The data normally arrives as a continuous data stream at a high speed and in huge volumes. The cost of handling this massive data can be significant. Providing interactive latency in processing the data is often impractical due to the fact that the data is growing exponentially and even faster than Moore’s law predictions. To overcome this problem, approximate computing has recently emerged as a promising solution. Approximate computing is based on the observation that many modern applications are amenable to an approximate, rather than the exact output. Unlike traditional computing, approximate computing tolerates lower accuracy to achieve lower latency by computing over a partial subset instead of the entire input data. Unfortunately, the advancements in approximate computing are primarily geared towards batch analytics and cannot provide low-latency guarantees in the context of stream processing, where new data continuously arrives as an unbounded stream. In this thesis, we design and implement approximate computing techniques for processing and interacting with high-speed and large-scale stream data to achieve low latency and efficient utilization of resources.
To achieve these goals, we have designed and built the following approximate data analytics systems:
• StreamApprox—a data stream analytics system for approximate computing. This system supports approximate computing for low-latency stream analytics in a transparent way and has an ability to adapt to rapid fluctuations of input data streams. In this system, we designed an online adaptive stratified reservoir sampling algorithm to produce approximate output with bounded error.
• IncApprox—a data analytics system for incremental approximate computing. This system adopts approximate and incremental computing in stream processing to achieve high-throughput and low-latency with efficient resource utilization. In this system, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error.
• PrivApprox—a data stream analytics system for privacy-preserving and approximate computing. This system supports high utility and low-latency data analytics and preserves user’s privacy at the same time. The system is based on the combination of privacy-preserving data analytics and approximate computing.
• ApproxJoin—an approximate distributed joins system. This system improves the performance of joins — critical but expensive operations in big data systems. In this system, we employed a sketching technique (Bloom filter) to avoid shuffling non-joinable data items through the network as well as proposed a novel sampling mechanism that executes during the join to obtain an unbiased representative sample of the join output. Our evaluation based on micro-benchmarks and real world case studies shows that these systems can achieve significant performance speedup compared to state-of-the-art systems by tolerating negligible accuracy loss of the analytics output. In addition, our systems allow users to systematically make a trade-off between accuracy and throughput/latency and require no/minor modifications to the existing applications
Multiple-Change-Point Modeling and Exact Bayesian Inference of Degradation Signal for Prognostic Improvement
Prognostics play an increasingly important role in modern engineering systems for smart maintenance decision-making. In parametric regression-based approaches, the parametric models are often too rigid to model degradation signals in many applications. In this paper, we propose a Bayesian multiple-change-point (CP) modeling framework to better capture the degradation path and improve the prognostics. At the offline modeling stage, a novel stochastic process is proposed to model the joint prior of CPs and positions. All hyperparameters are estimated through an empirical two-stage process. At the online monitoring and remaining useful life (RUL) prediction stage, a recursive updating algorithm is developed to exactly calculate the posterior distribution and RUL prediction sequentially. To control the computational cost, a fixed-support-size strategy in the online model updating and a partial Monte Carlo strategy in the RUL prediction are proposed. The effectiveness and advantages of the proposed method are demonstrated through thorough simulation and real case studies
TREATMENT OF INFLUENTIAL OBSERVATIONS IN THE CURRENT EMPLOYMENT STATISTICS SURVEY
It is common for many establishment surveys that a sample contains a fraction of observations that may seriously affect survey estimates. Influential observations may appear in the sample due to imperfections of the survey design that cannot fully account for the dynamic and heterogeneous nature of the population of businesses. An observation may become influential due to a relatively large survey weight, extreme value, or combination of the weight and value.
We propose a Winsorized estimator with a choice of cutoff points that guarantees that the resulting mean squared error is lower than the variance of the original survey weighted estimator. This estimator is based on very un-restrictive modeling assumptions and can be safely used when the sample is sufficiently large.
We consider a different approach when the sample is small. Estimation from small samples generally relies on strict model assumptions. Robustness here is understood as insensitivity of an estimator to model misspecification or to appearance of outliers. The proposed approach is a slight modification of the classical linear mixed model application to small area estimation. The underlying distribution of the random error term is a scale mixture of two normal distributions. This setup can describe outliers in individual observations. It is also suitable for a more general situation where units from two distinct populations are put together for estimation.
The mixture group indicator is not observed. The probabilities of observations coming from a group with a smaller or larger variance are estimated from the data. These conditional probabilities can serve as the basis for a formal test on outlyingness at the area level.
Simulations are carried out to compare several alternative estimators under different scenarios. Performance of the bootstrap method for prediction confidence intervals is investigated using simulations. We also compare the proposed method with alternative existing methods in a study using data from the Current Employment Statistics Survey conducted by the U.S. Bureau of Labor Statistics
- …