63,934 research outputs found
On Optimal Multiple Changepoint Algorithms for Large Data
There is an increasing need for algorithms that can accurately detect
changepoints in long time-series, or equivalent, data. Many common approaches
to detecting changepoints, for example based on penalised likelihood or minimum
description length, can be formulated in terms of minimising a cost over
segmentations. Dynamic programming methods exist to solve this minimisation
problem exactly, but these tend to scale at least quadratically in the length
of the time-series. Algorithms, such as Binary Segmentation, exist that have a
computational cost that is close to linear in the length of the time-series,
but these are not guaranteed to find the optimal segmentation. Recently pruning
ideas have been suggested that can speed up the dynamic programming algorithms,
whilst still being guaranteed to find true minimum of the cost function. Here
we extend these pruning methods, and introduce two new algorithms for
segmenting data, FPOP and SNIP. Empirical results show that FPOP is
substantially faster than existing dynamic programming methods, and unlike the
existing methods its computational efficiency is robust to the number of
changepoints in the data. We evaluate the method at detecting Copy Number
Variations and observe that FPOP has a computational cost that is competitive
with that of Binary Segmentation.Comment: 20 page
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
Survey on Combinatorial Register Allocation and Instruction Scheduling
Register allocation (mapping variables to processor registers or memory) and
instruction scheduling (reordering instructions to increase instruction-level
parallelism) are essential tasks for generating efficient assembly code in a
compiler. In the last three decades, combinatorial optimization has emerged as
an alternative to traditional, heuristic algorithms for these two tasks.
Combinatorial optimization approaches can deliver optimal solutions according
to a model, can precisely capture trade-offs between conflicting decisions, and
are more flexible at the expense of increased compilation time.
This paper provides an exhaustive literature review and a classification of
combinatorial optimization approaches to register allocation and instruction
scheduling, with a focus on the techniques that are most applied in this
context: integer programming, constraint programming, partitioned Boolean
quadratic programming, and enumeration. Researchers in compilers and
combinatorial optimization can benefit from identifying developments, trends,
and challenges in the area; compiler practitioners may discern opportunities
and grasp the potential benefit of applying combinatorial optimization
Bicriteria Network Design Problems
We study a general class of bicriteria network design problems. A generic
problem in this class is as follows: Given an undirected graph and two
minimization objectives (under different cost functions), with a budget
specified on the first, find a <subgraph \from a given subgraph-class that
minimizes the second objective subject to the budget on the first. We consider
three different criteria - the total edge cost, the diameter and the maximum
degree of the network. Here, we present the first polynomial-time approximation
algorithms for a large class of bicriteria network design problems for the
above mentioned criteria. The following general types of results are presented.
First, we develop a framework for bicriteria problems and their
approximations. Second, when the two criteria are the same %(note that the cost
functions continue to be different) we present a ``black box'' parametric
search technique. This black box takes in as input an (approximation) algorithm
for the unicriterion situation and generates an approximation algorithm for the
bicriteria case with only a constant factor loss in the performance guarantee.
Third, when the two criteria are the diameter and the total edge costs we use a
cluster-based approach to devise a approximation algorithms --- the solutions
output violate both the criteria by a logarithmic factor. Finally, for the
class of treewidth-bounded graphs, we provide pseudopolynomial-time algorithms
for a number of bicriteria problems using dynamic programming. We show how
these pseudopolynomial-time algorithms can be converted to fully
polynomial-time approximation schemes using a scaling technique.Comment: 24 pages 1 figur
A synthesis of logic and bio-inspired techniques in the design of dependable systems
Much of the development of model-based design and dependability analysis in the design of dependable systems, including software intensive systems, can be attributed to the application of advances in formal logic and its application to fault forecasting and verification of systems. In parallel, work on bio-inspired technologies has shown potential for the evolutionary design of engineering systems via automated exploration of potentially large design spaces. We have not yet seen the emergence of a design paradigm that effectively combines these two techniques, schematically founded on the two pillars of formal logic and biology, from the early stages of, and throughout, the design lifecycle. Such a design paradigm would apply these techniques synergistically and systematically to enable optimal refinement of new designs which can be driven effectively by dependability requirements. The paper sketches such a model-centric paradigm for the design of dependable systems, presented in the scope of the HiP-HOPS tool and technique, that brings these technologies together to realise their combined potential benefits. The paper begins by identifying current challenges in model-based safety assessment and then overviews the use of meta-heuristics at various stages of the design lifecycle covering topics that span from allocation of dependability requirements, through dependability analysis, to multi-objective optimisation of system architectures and maintenance schedules
User-friendly Support for Common Concepts in a Lightweight Verifier
Machine verification of formal arguments can only increase our confidence in the correctness of those arguments, but the costs of employing machine verification still outweigh the benefits for some common kinds of formal reasoning activities. As a result, usability is becoming increasingly important in the design of formal verification tools. We describe the "aartifact" lightweight verification system, designed for processing formal arguments involving basic, ubiquitous mathematical concepts. The system is a prototype for investigating potential techniques for improving the usability of formal verification systems. It leverages techniques drawn both from existing work and from our own efforts. In addition to a parser for a familiar concrete syntax and a mechanism for automated syntax lookup, the system integrates (1) a basic logical inference algorithm, (2) a database of propositions governing common mathematical concepts, and (3) a data structure that computes congruence closures of expressions involving relations found in this database. Together, these components allow the system to better accommodate the expectations of users interested in verifying formal arguments involving algebraic and logical manipulations of numbers, sets, vectors, and related operators and predicates. We demonstrate the reasonable performance of this system on typical formal arguments and briefly discuss how the system's design contributed to its usability in two case studies
- …