Search CORE

382,538 research outputs found

Speculative Approximations for Terascale Analytics

Author: Qin Chengjie
Rusu Florin
Publication venue
Publication date: 31/12/2014
Field of study

Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. The number of configurations is determined adaptively at runtime, while the configurations themselves are extracted from a distribution that is continuously learned following a Bayesian process. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascale-size synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a

1/20^{\text{th}}

fraction of the time

arXiv.org e-Print Archive

eScholarship - University of California

Reliability demonstration for safety-critical systems

Author: Bendell T
McCollin C
Tal O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/08/2002
Field of study

This paper suggests a new model for reliability demonstration of safety-critical systems, based on the TRW Software Reliability Theory. The paper describes the model; the test equipment required and test strategies based on the various constraints occurring during software development. The paper also compares a new testing method, Single Risk Sequential Testing (SRST), with the standard Probability Ratio Sequential Testing method (PRST), and concludes that: • SRST provides higher chances of success than PRST • SRST takes less time to complete than PRST • SRST satisfies the consumer risk criterion, whereas PRST provides a much smaller consumer risk than the requirement

Crossref

Nottingham Trent Institutional Repository (IRep)

Gibbs Max-margin Topic Models with Data Augmentation

Author: Chen Ning
Perkins Hugh
Zhang Bo
Zhu Jun
Publication venue
Publication date: 10/10/2013
Field of study

Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

arXiv.org e-Print Archive

CiteSeerX

An ontology enhanced parallel SVM for scalable spam filter training

Author: Bauer
Blanco
Blanzieri
Blei
Breiman
Cao
Caruana
Chawla
Colas
Cristianini
Dean
Do
Gansterer
Godwin Caruana
Graf
Hall
Huang
Kearns
Kim
Maozhen Li
Mei
Platt
Suykens
Taura
Vapnik
Wang
Woodsend
Yang Liu
Zanghirati
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

Crossref

Brunel University Research Archive

Bayesian sequential estimation of the reliability of a parallel-series system

Author: Benkamra
Berry
El Gohary
Hardwick
Hardwick
Lin
Mekki Terbeche
Mounir Tlemcani
Rekab
Rekab
Sarhan
Woodroofe
Zohra Benkamra
Publication venue: 'Elsevier BV'
Publication date: 02/04/2012
Field of study

We give a risk-averse solution to the problem of estimating the reliability of a parallel-series system. We adopt a beta-binomial model for components reliabilities, and assume that the total sample size for the experience is fixed. The allocation at subsystems or components level may be random. Based on the sampling schemes for parallel and series systems separately, we propose a hybrid sequential scheme for the parallel-series system. Asymptotic optimality of the Bayes risk associated with quadratic loss is proved with the help of martingale convergence properties.Comment: 12 page

arXiv.org e-Print Archive

Crossref