Search CORE

158,406 research outputs found

Uplift Modeling with Multiple Treatments and General Response Types

Author: Fang Xiao
Simchi-Levi David
Zhao Yan
Publication venue
Publication date: 01/01/2017
Field of study

Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods

arXiv.org e-Print Archive

DSpace@MIT

Crossref

FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage

Author: Böhme Marcel
Cadar Cristian
Evans Chris
Haller Istvan
Hocevar Sam
Holler Christian
Zalewski Michał
Zalewski Michał
Zalewski Michał
Zalewski Michał
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2017
Field of study

In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the depth of program coverage it achieves, in particular because it does not consider which parts of program inputs should not be mutated in order to maintain deep program coverage. We propose an approach, FairFuzz, that helps alleviate this limitation in two key steps. First, FairFuzz automatically prioritizes inputs exercising rare parts of the program under test. Second, it automatically adjusts the mutation of inputs so that the mutated inputs are more likely to exercise these same rare parts of the program. We conduct evaluation on real-world programs against state-of-the-art versions of AFL, thoroughly repeating experiments to get good measures of variability. We find that on certain benchmarks FairFuzz shows significant coverage increases after 24 hours compared to state-of-the-art versions of AFL, while on others it achieves high program coverage at a significantly faster rate

arXiv.org e-Print Archive

Crossref

Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing

Author: Shen Jeremy J.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We propose a flexible change-point model for inhomogeneous Poisson Processes, which arise naturally from next-generation DNA sequencing, and derive score and generalized likelihood statistics for shifts in intensity functions. We construct a modified Bayesian information criterion (mBIC) to guide model selection, and point-wise approximate Bayesian confidence intervals for assessing the confidence in the segmentation. The model is applied to DNA Copy Number profiling with sequencing data and evaluated on simulated spike-in and real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Spatio-temporal epidemic modelling using additive-multiplicative intensity models

Author: Höhle Michael
Publication venue
Publication date: 02/10/2008
Field of study

An extension of the stochastic susceptible-infectious-recovered (SIR) model is proposed in order to accommodate a regression context for modelling infectious disease surveillance data. The proposal is based on a multivariate counting process specified by conditional intensities, which contain an additive epidemic component and a multiplicative endemic component. This allows the analysis of endemic infectious diseases by quantifying risk factors for infection by external sources in addition to infective contacts. Simulation from the model is straightforward by Ogata's modified thinning algorithm. Inference can be performed by considering the full likelihood of the stochastic process with additional parameter restrictions to ensure non-negative conditional intensities. As an illustration we analyse data provided by the Federal Research Centre for Virus Diseases of Animals, Wusterhausen, Germany, on the incidence of the classical swine fever virus in Germany during 1993-2004

Open Access LMU

CrY2H-seq: a massively multiplexed assay for deep-coverage interactome mapping.

Author: Bartlett Anna
Castanon Rosa
Ecker Joseph R
Feeney Joseph
Galli Mary
Garza Renee M
Goubil Adeline
Huang Shao-Shan C
MacWilliams Andrew
Nery Joseph R
O'Malley Ronan
Trigg Shelly A
Zhang Zhuzhu Z
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Broad-scale protein-protein interaction mapping is a major challenge given the cost, time, and sensitivity constraints of existing technologies. Here, we present a massively multiplexed yeast two-hybrid method, CrY2H-seq, which uses a Cre recombinase interaction reporter to intracellularly fuse the coding sequences of two interacting proteins and next-generation DNA sequencing to identify these interactions en masse. We applied CrY2H-seq to investigate sparsely annotated Arabidopsis thaliana transcription factors interactions. By performing ten independent screens testing a total of 36 million binary interaction combinations, and uncovering a network of 8,577 interactions among 1,453 transcription factors, we demonstrate CrY2H-seq's improved screening capacity, efficiency, and sensitivity over those of existing technologies. The deep-coverage network resource we call AtTFIN-1 recapitulates one-third of previously reported interactions derived from diverse methods, expands the number of known plant transcription factor interactions by three-fold, and reveals previously unknown family-specific interaction module associations with plant reproductive development, root architecture, and circadian coordination

HAL Descartes

eScholarship - University of California

Hal-Diderot