90,768 research outputs found
A Case Study in Matching Test and Proof Coverage
AbstractThis paper studies the complementarity of test and deductive proof processes for Java programs specified in JML (Java Modeling Language). The proof of a program may be long and difficult, especially when automatic provers give up. When a theorem is not automatically proved, there are two possibilities: either the theorem is correct and there are not enough pieces of information to deal with the proof, or the theorem is incorrect. In order to discriminate between those two alternatives, testing techniques can be used. Here, we present experiments around the use of the JACK tool to prove Java programs annotated with JML assertions. When JACK fails to decide proof obligations, we use a combinatorial testing tool, TOBIAS, to produce large test suites that exercise the unproved program parts. The key issue is to establish the relevance of the test suite with respect to the unproved proof obligations. Therefore, we use code coverage techniques: our approach takes advantage of the statement orientation of the JACK tool to compare the statements involved in the unproved proof obligations and the statements covered by the test suite. Finally, we ensure our confidence within the test suites, by evaluating them on mutant program killing exercises. These techniques have been put into practice and are illustrated by a simple case study
Objective Bayes and Conditional Frequentist Inference
Objective Bayesian methods have garnered considerable interest and support among statisticians,
particularly over the past two decades. It has often been ignored, however, that in
some cases the appropriate frequentist inference to match is a conditional one. We present
various methods for extending the probability matching prior (PMP) methods to conditional
settings. A method based on saddlepoint approximations is found to be the most
tractable and we demonstrate its use in the most common exact ancillary statistic models.
As part of this analysis, we give a proof of an exactness property of a particular PMP in
location-scale models. We use the proposed matching methods to investigate the relationships
between conditional and unconditional PMPs. A key component of our analysis is a
numerical study of the performance of probability matching priors from both a conditional
and unconditional perspective in exact ancillary models. In concluding remarks we propose
many routes for future research
XRay: Enhancing the Web's Transparency with Differential Correlation
Today's Web services - such as Google, Amazon, and Facebook - leverage user
data for varied purposes, including personalizing recommendations, targeting
advertisements, and adjusting prices. At present, users have little insight
into how their data is being used. Hence, they cannot make informed choices
about the services they choose. To increase transparency, we developed XRay,
the first fine-grained, robust, and scalable personal data tracking system for
the Web. XRay predicts which data in an arbitrary Web account (such as emails,
searches, or viewed products) is being used to target which outputs (such as
ads, recommended products, or prices). XRay's core functions are service
agnostic and easy to instantiate for new services, and they can track data
within and across services. To make predictions independent of the audited
service, XRay relies on the following insight: by comparing outputs from
different accounts with similar, but not identical, subsets of data, one can
pinpoint targeting through correlation. We show both theoretically, and through
experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision
and recall by correlating data from a surprisingly small number of extra
accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security
Symposium (USENIX Security 14
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
BIGMAC : breaking inaccurate genomes and merging assembled contigs for long read metagenomic assembly.
BackgroundThe problem of de-novo assembly for metagenomes using only long reads is gaining attention. We study whether post-processing metagenomic assemblies with the original input long reads can result in quality improvement. Previous approaches have focused on pre-processing reads and optimizing assemblers. BIGMAC takes an alternative perspective to focus on the post-processing step.ResultsUsing both the assembled contigs and original long reads as input, BIGMAC first breaks the contigs at potentially mis-assembled locations and subsequently scaffolds contigs. Our experiments on metagenomes assembled from long reads show that BIGMAC can improve assembly quality by reducing the number of mis-assemblies while maintaining or increasing N50 and N75. Moreover, BIGMAC shows the largest N75 to number of mis-assemblies ratio on all tested datasets when compared to other post-processing tools.ConclusionsBIGMAC demonstrates the effectiveness of the post-processing approach in improving the quality of metagenomic assemblies
Frequentist and Bayesian measures of confidence via multiscale bootstrap for testing three regions
A new computation method of frequentist -values and Bayesian posterior
probabilities based on the bootstrap probability is discussed for the
multivariate normal model with unknown expectation parameter vector. The null
hypothesis is represented as an arbitrary-shaped region. We introduce new
parametric models for the scaling-law of bootstrap probability so that the
multiscale bootstrap method, which was designed for one-sided test, can also
computes confidence measures of two-sided test, extending applicability to a
wider class of hypotheses. Parameter estimation is improved by the two-step
multiscale bootstrap and also by including higher-order terms. Model selection
is important not only as a motivating application of our method, but also as an
essential ingredient in the method. A compromise between frequentist and
Bayesian is attempted by showing that the Bayesian posterior probability with
an noninformative prior is interpreted as a frequentist -value of
``zero-sided'' test
Pattern Matching for sets of segments
In this paper we present algorithms for a number of problems in geometric
pattern matching where the input consist of a collections of segments in the
plane. Our work consists of two main parts. In the first, we address problems
and measures that relate to collections of orthogonal line segments in the
plane. Such collections arise naturally from problems in mapping buildings and
robot exploration.
We propose a new measure of segment similarity called a \emph{coverage
measure}, and present efficient algorithms for maximising this measure between
sets of axis-parallel segments under translations. Our algorithms run in time
O(n^3\polylog n) in the general case, and run in time O(n^2\polylog n) for
the case when all segments are horizontal. In addition, we show that when
restricted to translations that are only vertical, the Hausdorff distance
between two sets of horizontal segments can be computed in time roughly
O(n^{3/2}{\sl polylog}n). These algorithms form significant improvements over
the general algorithm of Chew et al. that takes time . In the
second part of this paper we address the problem of matching polygonal chains.
We study the well known \Frd, and present the first algorithm for computing the
\Frd under general translations. Our methods also yield algorithms for
computing a generalization of the \Fr distance, and we also present a simple
approximation algorithm for the \Frd that runs in time O(n^2\polylog n).Comment: To appear in the 12 ACM Symposium on Discrete Algorithms, Jan 200
Exact Post Model Selection Inference for Marginal Screening
We develop a framework for post model selection inference, via marginal
screening, in linear regression. At the core of this framework is a result that
characterizes the exact distribution of linear functions of the response ,
conditional on the model being selected (``condition on selection" framework).
This allows us to construct valid confidence intervals and hypothesis tests for
regression coefficients that account for the selection procedure. In contrast
to recent work in high-dimensional statistics, our results are exact
(non-asymptotic) and require no eigenvalue-like assumptions on the design
matrix . Furthermore, the computational cost of marginal regression,
constructing confidence intervals and hypothesis testing is negligible compared
to the cost of linear regression, thus making our methods particularly suitable
for extremely large datasets. Although we focus on marginal screening to
illustrate the applicability of the condition on selection framework, this
framework is much more broadly applicable. We show how to apply the proposed
framework to several other selection procedures including orthogonal matching
pursuit, non-negative least squares, and marginal screening+Lasso
- …