162,153 research outputs found
FlashProfile: A Framework for Synthesizing Data Profiles
We address the problem of learning a syntactic profile for a collection of
strings, i.e. a set of regex-like patterns that succinctly describe the
syntactic variations in the strings. Real-world datasets, typically curated
from multiple sources, often contain data in various syntactic formats. Thus,
any data processing task is preceded by the critical step of data format
identification. However, manual inspection of data to identify the different
formats is infeasible in standard big-data scenarios.
Prior techniques are restricted to a small set of pre-defined patterns (e.g.
digits, letters, words, etc.), and provide no control over granularity of
profiles. We define syntactic profiling as a problem of clustering strings
based on syntactic similarity, followed by identifying patterns that succinctly
describe each cluster. We present a technique for synthesizing such profiles
over a given language of patterns, that also allows for interactive refinement
by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have
implemented our technique as FlashProfile. Across tasks over large
real datasets, we observe a median profiling time of only s.
Furthermore, we show that access to syntactic profiles may allow for more
accurate synthesis of programs, i.e. using fewer examples, in
programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201
Domain-Type-Guided Refinement Selection Based on Sliced Path Prefixes
Abstraction is a successful technique in software verification, and
interpolation on infeasible error paths is a successful approach to
automatically detect the right level of abstraction in counterexample-guided
abstraction refinement. Because the interpolants have a significant influence
on the quality of the abstraction, and thus, the effectiveness of the
verification, an algorithm for deriving the best possible interpolants is
desirable. We present an analysis-independent technique that makes it possible
to extract several alternative sequences of interpolants from one given
infeasible error path, if there are several reasons for infeasibility in the
error path. We take as input the given infeasible error path and apply a
slicing technique to obtain a set of error paths that are more abstract than
the original error path but still infeasible, each for a different reason. The
(more abstract) constraints of the new paths can be passed to a standard
interpolation engine, in order to obtain a set of interpolant sequences, one
for each new path. The analysis can then choose from this set of interpolant
sequences and select the most appropriate, instead of being bound to the single
interpolant sequence that the interpolation engine would normally return. For
example, we can select based on domain types of variables in the interpolants,
prefer to avoid loop counters, or compare with templates for potential loop
invariants, and thus control what kind of information occurs in the abstraction
of the program. We implemented the new algorithm in the open-source
verification framework CPAchecker and show that our proof-technique-independent
approach yields a significant improvement of the effectiveness and efficiency
of the verification process.Comment: 10 pages, 5 figures, 1 table, 4 algorithm
Time-sensitive Information Flow Control in Timed Event-B
Protecting confidential data in todayâs computing\ud
environments is an important problem. Information flow\ud
control can help to avoid information leakage and violations\ud
introduced by executing the software applications. In software\ud
development cycle, it is important to handle security related\ud
issues from the beginning specifications at the level of abstract.\ud
Mu [1] investigated the problem of preserving information flow\ud
security in the Event-B specification models. A typed Event-\ud
B model was presented to enforce information flow security\ud
and to prevent direct flows introduced by the system. However,\ud
in practice, timing behaviours of programs can also introduce\ud
a covert flow. The problem of run-time flow monitoring and\ud
controlling must also be addressed. This paper investigates\ud
information flow control in the Event-B specification language\ud
with timing constructs. We present a timed Event-B system\ud
by introducing timers and relevant time constraints into the\ud
system events. We suggest a time-sensitive flow security condition\ud
for the timed Event-B systems, and present a type system\ud
to close the covert channels of timing flows for the system by\ud
ensuring the security condition. We then investigate how to\ud
refine timed events during the stepwise refinement modelling\ud
to satisfy the security condition
- âŠ