10,712 research outputs found
Lightweight Asynchronous Snapshots for Distributed Dataflows
Distributed stateful stream processing enables the deployment and execution
of large scale continuous computations in the cloud, targeting both low latency
and high throughput. One of the most fundamental challenges of this paradigm is
providing processing guarantees under potential failures. Existing approaches
rely on periodic global state snapshots that can be used for failure recovery.
Those approaches suffer from two main drawbacks. First, they often stall the
overall computation which impacts ingestion. Second, they eagerly persist all
records in transit along with the operation states which results in larger
snapshots than required. In this work we propose Asynchronous Barrier
Snapshotting (ABS), a lightweight algorithm suited for modern dataflow
execution engines that minimises space requirements. ABS persists only operator
states on acyclic execution topologies while keeping a minimal record log on
cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics
engine that supports stateful stream processing. Our evaluation shows that our
algorithm does not have a heavy impact on the execution, maintaining linear
scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure
Relations between automata and the simple k-path problem
Let be a directed graph on vertices. Given an integer , the
SIMPLE -PATH problem asks whether there exists a simple -path in . In
case is weighted, the MIN-WT SIMPLE -PATH problem asks for a simple
-path in of minimal weight. The fastest currently known deterministic
algorithm for MIN-WT SIMPLE -PATH by Fomin, Lokshtanov and Saurabh runs in
time for graphs with integer weights in
the range . This is also the best currently known deterministic
algorithm for SIMPLE k-PATH- where the running time is the same without the
factor. We define to be the set of words of
length whose symbols are all distinct. We show that an explicit
construction of a non-deterministic automaton (NFA) of size for implies an algorithm of running time for MIN-WT SIMPLE -PATH when the weights are
non-negative or the constructed NFA is acyclic as a directed graph. We show
that the algorithm of Kneis et al. and its derandomization by Chen et al. for
SIMPLE -PATH can be used to construct an acylic NFA for of size
.
We show, on the other hand, that any NFA for must be size at least
. We thus propose closing this gap and determining the smallest NFA for
as an interesting open problem that might lead to faster algorithms
for MIN-WT SIMPLE -PATH.
We use a relation between SIMPLE -PATH and non-deterministic xor automata
(NXA) to give another direction for a deterministic algorithm with running time
for SIMPLE -PATH
The data-exchange chase under the microscope
In this paper we take closer look at recent developments for the chase
procedure, and provide additional results. Our analysis allows us create a
taxonomy of the chase variations and the properties they satisfy. Two of the
most central problems regarding the chase is termination, and discovery of
restricted classes of sets of dependencies that guarantee termination of the
chase. The search for the restricted classes has been motivated by a fairly
recent result that shows that it is undecidable to determine whether the chase
with a given dependency set will terminate on a given instance. There is a
small dissonance here, since the quest has been for classes of sets of
dependencies guaranteeing termination of the chase on all instances, even
though the latter problem was not known to be undecidable. We resolve the
dissonance in this paper by showing that determining whether the chase with a
given set of dependencies terminates on all instances is coRE-complete. For the
hardness proof we use a reduction from word rewriting systems, thereby also
showing the close connection between the chase and word rewriting. The same
reduction also gives us the aforementioned instance-dependent RE-completeness
result as a byproduct. For one of the restricted classes guaranteeing
termination on all instances, the stratified sets dependencies, we provide new
complexity results for the problem of testing whether a given set of
dependencies belongs to it. These results rectify some previous claims that
have occurred in the literature.Comment: arXiv admin note: substantial text overlap with arXiv:1303.668
Large-scale Parallel Stratified Defeasible Reasoning
We are recently experiencing an unprecedented explosion of available data from the Web, sensors readings, scientific databases, government authorities and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling huge amounts of data for these applications. In this paper, we consider inconsistency-tolerant reasoning in the form of defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge datasets. We extend previous work by dealing with predicates of arbitrary arity, under the assumption of stratification. Moving from unary to multi-arity predicates is a decisive step towards practical applications, e.g. reasoning with linked open (RDF) data. Our experimental results demonstrate that defeasible reasoning with millions of data is performant, and has the potential to scale to billions of facts
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one languageâthe source languageâto another languageâthe target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
- âŠ