158,406 research outputs found
Uplift Modeling with Multiple Treatments and General Response Types
Randomized experiments have been used to assist decision-making in many
areas. They help people select the optimal treatment for the test population
with certain statistical guarantee. However, subjects can show significant
heterogeneity in response to treatments. The problem of customizing treatment
assignment based on subject characteristics is known as uplift modeling,
differential response analysis, or personalized treatment learning in
literature. A key feature for uplift modeling is that the data is unlabeled. It
is impossible to know whether the chosen treatment is optimal for an individual
subject because response under alternative treatments is unobserved. This
presents a challenge to both the training and the evaluation of uplift models.
In this paper we describe how to obtain an unbiased estimate of the key
performance metric of an uplift model, the expected response. We present a new
uplift algorithm which creates a forest of randomized trees. The trees are
built with a splitting criterion designed to directly optimize their uplift
performance based on the proposed evaluation method. Both the evaluation method
and the algorithm apply to arbitrary number of treatments and general response
types. Experimental results on synthetic data and industry-provided data show
that our algorithm leads to significant performance improvement over other
applicable methods
FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage
In recent years, fuzz testing has proven itself to be one of the most
effective techniques for finding correctness bugs and security vulnerabilities
in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has
become popular thanks to its ease-of-use and bug-finding power. However, AFL
remains limited in the depth of program coverage it achieves, in particular
because it does not consider which parts of program inputs should not be
mutated in order to maintain deep program coverage. We propose an approach,
FairFuzz, that helps alleviate this limitation in two key steps. First,
FairFuzz automatically prioritizes inputs exercising rare parts of the program
under test. Second, it automatically adjusts the mutation of inputs so that the
mutated inputs are more likely to exercise these same rare parts of the
program. We conduct evaluation on real-world programs against state-of-the-art
versions of AFL, thoroughly repeating experiments to get good measures of
variability. We find that on certain benchmarks FairFuzz shows significant
coverage increases after 24 hours compared to state-of-the-art versions of AFL,
while on others it achieves high program coverage at a significantly faster
rate
Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing
We propose a flexible change-point model for inhomogeneous Poisson Processes,
which arise naturally from next-generation DNA sequencing, and derive score and
generalized likelihood statistics for shifts in intensity functions. We
construct a modified Bayesian information criterion (mBIC) to guide model
selection, and point-wise approximate Bayesian confidence intervals for
assessing the confidence in the segmentation. The model is applied to DNA Copy
Number profiling with sequencing data and evaluated on simulated spike-in and
real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Spatio-temporal epidemic modelling using additive-multiplicative intensity models
An extension of the stochastic susceptible-infectious-recovered (SIR) model is proposed in order to accommodate a regression context for modelling infectious disease surveillance data. The proposal is based on a multivariate counting process specified by conditional intensities, which contain an additive epidemic component and a multiplicative endemic component. This allows the analysis of endemic infectious diseases by quantifying risk factors for infection by external sources in addition to infective contacts. Simulation from the model is straightforward by Ogata's modified thinning algorithm. Inference can be performed by considering the full likelihood of the stochastic process with additional parameter restrictions to ensure non-negative conditional intensities.
As an illustration we analyse data provided by the Federal Research Centre for Virus Diseases of Animals, Wusterhausen, Germany, on the incidence of the classical swine fever virus in Germany during 1993-2004
CrY2H-seq: a massively multiplexed assay for deep-coverage interactome mapping.
Broad-scale protein-protein interaction mapping is a major challenge given the cost, time, and sensitivity constraints of existing technologies. Here, we present a massively multiplexed yeast two-hybrid method, CrY2H-seq, which uses a Cre recombinase interaction reporter to intracellularly fuse the coding sequences of two interacting proteins and next-generation DNA sequencing to identify these interactions en masse. We applied CrY2H-seq to investigate sparsely annotated Arabidopsis thaliana transcription factors interactions. By performing ten independent screens testing a total of 36 million binary interaction combinations, and uncovering a network of 8,577 interactions among 1,453 transcription factors, we demonstrate CrY2H-seq's improved screening capacity, efficiency, and sensitivity over those of existing technologies. The deep-coverage network resource we call AtTFIN-1 recapitulates one-third of previously reported interactions derived from diverse methods, expands the number of known plant transcription factor interactions by three-fold, and reveals previously unknown family-specific interaction module associations with plant reproductive development, root architecture, and circadian coordination
- …