42 research outputs found
Cautionary Tales on Synthetic Controls in Survival Analyses
Synthetic control (SC) methods have gained rapid popularity in economics
recently, where they have been applied in the context of inferring the effects
of treatments on standard continuous outcomes assuming linear input-output
relations. In medical applications, conversely, survival outcomes are often of
primary interest, a setup in which both commonly assumed data-generating
processes (DGPs) and target parameters are different. In this paper, we
therefore investigate whether and when SCs could serve as an alternative to
matching methods in survival analyses. We find that, because SCs rely on a
linearity assumption, they will generally be biased for the true expected
survival time in commonly assumed survival DGPs -- even when taking into
account the possibility of linearity on another scale as in accelerated failure
time models. Additionally, we find that, because SC units follow distributions
with lower variance than real control units, summaries of their distributions,
such as survival curves, will be biased for the parameters of interest in many
survival analyses. Nonetheless, we also highlight that using SCs can still
improve upon matching whenever the biases described above are outweighed by
extrapolation biases exhibited by imperfect matches, and investigate the use of
regularization to trade off the shortcomings of both approaches.Comment: To appear in the 3rd Conference on Causal Learning and Reasoning
(CLeaR 2024
Scenic: A Language for Scenario Specification and Scene Generation
We propose a new probabilistic programming language for the design and
analysis of perception systems, especially those based on machine learning.
Specifically, we consider the problems of training a perception system to
handle rare events, testing its performance under different conditions, and
debugging failures. We show how a probabilistic programming language can help
address these problems by specifying distributions encoding interesting types
of inputs and sampling these to generate specialized training and test sets.
More generally, such languages can be used for cyber-physical systems and
robotics to write environment models, an essential prerequisite to any formal
analysis. In this paper, we focus on systems like autonomous cars and robots,
whose environment is a "scene", a configuration of physical objects and agents.
We design a domain-specific language, Scenic, for describing "scenarios" that
are distributions over scenes. As a probabilistic programming language, Scenic
allows assigning distributions to features of the scene, as well as
declaratively imposing hard and soft constraints over the scene. We develop
specialized techniques for sampling from the resulting distribution, taking
advantage of the structure provided by Scenic's domain-specific syntax.
Finally, we apply Scenic in a case study on a convolutional neural network
designed to detect cars in road images, improving its performance beyond that
achieved by state-of-the-art synthetic data generation methods.Comment: 41 pages, 36 figures. Full version of a PLDI 2019 paper (extending UC
Berkeley EECS Department Tech Report No. UCB/EECS-2018-8
Targeted Greybox Fuzzing with Static Lookahead Analysis
Automatic test generation typically aims to generate inputs that explore new
paths in the program under test in order to find bugs. Existing work has,
therefore, focused on guiding the exploration toward program parts that are
more likely to contain bugs by using an offline static analysis.
In this paper, we introduce a novel technique for targeted greybox fuzzing
using an online static analysis that guides the fuzzer toward a set of target
locations, for instance, located in recently modified parts of the program.
This is achieved by first semantically analyzing each program path that is
explored by an input in the fuzzer's test suite. The results of this analysis
are then used to control the fuzzer's specialized power schedule, which
determines how often to fuzz inputs from the test suite. We implemented our
technique by extending a state-of-the-art, industrial fuzzer for Ethereum smart
contracts and evaluate its effectiveness on 27 real-world benchmarks. Using an
online analysis is particularly suitable for the domain of smart contracts
since it does not require any code instrumentation---instrumentation to
contracts changes their semantics. Our experiments show that targeted fuzzing
significantly outperforms standard greybox fuzzing for reaching 83% of the
challenging target locations (up to 14x of median speed-up)
Exploring the Boundaries of GPT-4 in Radiology
The recent success of general-domain large language models (LLMs) has
significantly changed the natural language processing paradigm towards a
unified foundation model across domains and applications. In this paper, we
focus on assessing the performance of GPT-4, the most capable LLM so far, on
the text-based applications for radiology reports, comparing against
state-of-the-art (SOTA) radiology-specific models. Exploring various prompting
strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and
we found GPT-4 either outperforms or is on par with current SOTA radiology
models. With zero-shot prompting, GPT-4 already obtains substantial gains
( 10% absolute improvement) over radiology models in temporal sentence
similarity classification (accuracy) and natural language inference ().
For tasks that require learning dataset-specific style or schema (e.g. findings
summarisation), GPT-4 improves with example-based prompting and matches
supervised SOTA. Our extensive error analysis with a board-certified
radiologist shows GPT-4 has a sufficient level of radiology knowledge with only
occasional errors in complex context that require nuanced domain knowledge. For
findings summarisation, GPT-4 outputs are found to be overall comparable with
existing manually-written impressions.Comment: EMNLP 2023 mai
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Termination Proofs from Tests
We show how a test suite for a sequential program can be profitably used to construct a termination proof. In particular, we describe an algorithm TpT for proving termination of a program based on information derived from testing it. TpT iteratively calls two phases: (a) an infer phase, and (b) a validate phase. In the infer phase, machine learning, in particular, linear regression is used to efficiently compute a candidate loop bound for every loop in the program. These loop bounds are verified for correctness by an off-the-shelf checker. If a loop bound is invalid, then the safety checker provides a test or a counterexample that is used to generate more data which is subsequently used by the next infer phase to compute better estimates for loop bounds. On the other hand, if all loop bounds are valid, then we have a proof of termination. We also describe a simple extension to our approach that allows us to infer polynomial loop bounds automatically. We have evaluated TpT on two benchmark sets, microbenchmarks obtained from recent literature on program termination, and Windows device drivers. Our results are promising – on the micro-benchmarks, we show that TpT is able to prove termination on 15 % more benchmarks than any previously known technique, and our evaluation on Windows device drivers demonstrates TpT’s ability to analyze and scale to real world applications
Unifying Views of Tail-Biting Trellis Constructions for Linear Block Codes
In this paper, we present new ways of describing and constructing linear tail-biting trellises for block codes. We extend the well-known Bahl–Cocke–Jelinek–Raviv (BCJR) construction for conventional trellises to tail-biting trellises. The BCJR-like labeling scheme yields a simple specification for the tail-biting trellis for the dual code, with the dual trellis having the same state-complexity profile as that of the primal code . Finally, we show that the algebraic specification of Forney for state spaces of conventional trellises has a natural extension to tail-biting trellises