895 research outputs found
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
The Knowledge Level in Cognitive Architectures: Current Limitations and Possible Developments
In this paper we identify and characterize an analysis of two problematic aspects affecting the representational level of cognitive architectures (CAs), namely: the limited size and the homogeneous typology of the encoded and processed knowledge.
We argue that such aspects may constitute not only a technological problem that, in our opinion, should be addressed in order to build articial agents able to exhibit intelligent behaviours in general scenarios, but also an epistemological one, since they limit the plausibility of the comparison of the CAs' knowledge representation and processing mechanisms with those executed by humans in their everyday activities. In the final part of the paper further directions of research will be explored, trying to address current limitations and
future challenges
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Finding The Lazy Programmer's Bugs
Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps
biased by what they believe to be the current boundary conditions of the function being tested. Or at
least, they were supposed to.
A major step forward was the development of property testing. Property testing requires the user to write a few
functional properties that are used to generate tests, and requires an external library or tool to create test data
for the tests. As such many thousands of tests can be created for a single property. For the purely functional
programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck
and Lazy SmallCheck [RNL08].
Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are
already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may
silently insert runtime exceptions for incomplete pattern matches.
We attempt to automate the testing process using these implicit tests. Our contributions are in four main
areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed
to generate test data without requiring additional programmer work or annotations. (2) To combine the
constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by
applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type
of test data at its most general, in order to prevent committing too early to monomorphic types that cause
needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements
inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make
our test data generation algorithm more expressive.
In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for
generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high
coverage test suites and detect common programming errors in the process
Quantifying Eventual Consistency with PBS
Data replication results in a fundamental trade-off between operation latency and consistency. At the weak end of the spectrum of possible consistency models is eventual consistency, which provides no limit to the staleness of data returned. However, anecdotally, eventual consistency is often “good enough ” for practitioners given its latency and availability benefits. In this work, we explain this phenomenon and demonstrate that, despite their weak guarantees, eventually consistent systems regularly return consistent data while providing lower latency than their strongly consistent counterparts. To quantify the behavior of eventually consistent stores, we introduce Probabilistically Bounded Staleness (PBS), a consistency model that provides expected bounds on data staleness with respect to both versions and wall clock time. We derive a closed-form solution for version-based staleness and model real-time staleness for a large class of quorum replicated, Dynamo-style stores. Using PBS, we measure the trade-off between latency and consistency for partial, non-overlapping quorum systems under Internet production workloads. We quantitatively demonstrate how and why eventually consistent systems frequently return consistent data within tens of milliseconds while offering large latency benefits. 1
The generation of concurrent code for declarative languages
PhD ThesisThis thesis presents an approach to the implementation of declarative languages
on a simple, general purpose concurrent architecture. The safe exploitation of
the available concurrency is managed by relatively sophisticated code generation
techniques to transform programs into an intermediate concurrent machine
code. Compilation techniques are discussed for 1'-HYBRID, a strongly typed
applicative language, and for 'c-HYBRID, a concurrent, nondeterministic logic
language.
An approach is presented for 1'- HYBRID whereby the style of programming
influences the concurrency utilised when a program executes. Code transformation
techniques are presented which generalise tail-recursion optimisation,
allowing many recursive functions to be modelled by perpetual processes. A
scheme is also presented to allow parallelism to be increased by the use of local
declarations, and constrained by the use of special forms of identity function.
In order to preserve determinism in the language, a novel fault handling
mechanism is used, whereby exceptions generated at run-time are treated as a
special class of values within the language.
A description is given of ,C-HYBRID, a dialect of the nondeterministic logic
language Concurrent Prolog. The language is embedded within the applicative
language 1'-HYBRID, yielding a combined applicative and logic programming
language. Various cross-calling techniques are described, including the use of
applicative scoping rules to allow local logical assertions.
A description is given of a polymorphic typechecking algorithm for logic
programs, which allows different instances of clauses to unify objects of different
types. The concept of a method is derived to allow unification Information to
be passed as an implicit argument to clauses which require it. In addition,
the typechecking algorithm permits higher-order objects such as functions to be
passed within arguments to clauses.
Using Concurrent Prolog's model of concurrency, techniques are described
which permit compilation of 'c-HYBRID programs to abstract machine code derived
from that used for the applicative language. The use of methods allows
polymorphic logic programs to execute without the need for run-time type information
in data structures.The Science and Engineering Research Council
Knowledge sharing and collaboration in translational research, and the DC-THERA Directory
Biomedical research relies increasingly on large collections of data sets and knowledge whose generation, representation and analysis often require large collaborative and interdisciplinary efforts. This dimension of ‘big data’ research calls for the development of computational tools to manage such a vast amount of data, as well as tools that can improve communication and access to information from collaborating researchers and from the wider community. Whenever research projects have a defined temporal scope, an additional issue of data management arises, namely how the knowledge generated within the project can be made available beyond its boundaries and life-time. DC-THERA is a European ‘Network of Excellence’ (NoE) that spawned a very large collaborative and interdisciplinary research community, focusing on the development of novel immunotherapies derived from fundamental research in dendritic cell immunobiology. In this article we introduce the DC-THERA Directory, which is an information system designed to support knowledge management for this research community and beyond. We present how the use of metadata and Semantic Web technologies can effectively help to organize the knowledge generated by modern collaborative research, how these technologies can enable effective data management solutions during and beyond the project lifecycle, and how resources such as the DC-THERA Directory fit into the larger context of e-science
- …