241 research outputs found
Linear tail-biting trellises: Characteristic generators and the BCJR-construction
We investigate the constructions of tail-biting trellises for linear block
codes introduced by Koetter/Vardy (2003) and Nori/Shankar (2006). For a given
code we will define the sets of characteristic generators more generally than
by Koetter/Vardy and we will investigate how the choice of characteristic
generators affects the set of resulting product trellises, called KV-trellises.
Furthermore, we will show that each KV-trellis is a BCJR-trellis, defined in a
slightly stronger sense than by Nori/Shankar, and that the latter are always
non-mergeable. Finally, we will address a duality conjecture of Koetter/Vardy
by making use of a dualization technique of BCJR-trellises and prove the
conjecture for minimal trellises.Comment: 28 page
Spectral Methods for Learning Multivariate Latent Tree Structure
This work considers the problem of learning the structure of multivariate
linear tree models, which include a variety of directed tree graphical models
with continuous, discrete, and mixed latent variables such as linear-Gaussian
models, hidden Markov models, Gaussian mixture models, and Markov evolutionary
trees. The setting is one where we only have samples from certain observed
variables in the tree, and our goal is to estimate the tree structure (i.e.,
the graph of how the underlying hidden variables are connected to each other
and to the observed variables). We propose the Spectral Recursive Grouping
algorithm, an efficient and simple bottom-up procedure for recovering the tree
structure from independent samples of the observed variables. Our finite sample
size bounds for exact recovery of the tree structure reveal certain natural
dependencies on underlying statistical and structural properties of the
underlying joint distribution. Furthermore, our sample complexity guarantees
have no explicit dependence on the dimensionality of the observed variables,
making the algorithm applicable to many high-dimensional settings. At the heart
of our algorithm is a spectral quartet test for determining the relative
topology of a quartet of variables from second-order statistics
Convergent types for shared memory
Dissertação de mestrado em Computer ScienceIt is well-known that consistency in shared memory concurrent programming comes with
the price of degrading performance and scalability. Some of the existing solutions to this
problem end up with high-level complexity and are not programmer friendly.
We present a simple and well-defined approach to obtain relevant results for shared memory
environments through relaxing synchronization. For that, we will look into Mergeable
Data Types, data structures analogous to Conflict-Free Replicated Data Types but designed to
perform in shared memory.
CRDTs were the first formal approach engaging a solid theoretical study about eventual
consistency on distributed systems, answering the CAP Theorem problem and providing
high-availability. With CRDTs, updates are unsynchronized, and replicas eventually converge
to a correct common state. However, CRDTs are not designed to perform in shared
memory. In large-scale distributed systems the merge cost is negligible when compared to
network mediated synchronization. Therefore, we have migrated the concept by developing
the already existent Mergeable Data Types through formally defining a programming
model that we named Global-Local View. Furthermore, we have created a portfolio of MDTs
and demonstrated that in the appropriated scenarios we can largely benefit from the model.É bem sabido que para garantir coerência em programas concorrentes num ambiente de
memória partilhada sacrifica-se performance e escalabilidade. Alguns dos métodos existentes
para garantirem resultados significativos introduzem uma elevada complexidade e
não são práticos.
O nosso objetivo é o de garantir uma abordagem simples e bem definida de alcançar
resultados notáveis em ambientes de memória partilhada, quando comparados com os
métodos existentes, relaxando a coerência. Para tal, vamos analisar o conceito de Mergeable
Data Type, estruturas análogas aos Conflict-Free Replicated Data Types mas concebidas para
memória partilhada.
CRDTs foram a primeira abordagem a desenvolver um estudo formal sobre eventual consistency,
respondendo ao problema descrito no CAP Theorem e garantindo elevada disponibilidade.
Com CRDTs os updates não são síncronos e as réplicas convergem eventualmente
para um estado correto e comum. No entanto, não foram concebidos para atuar
em memória partilhada. Em sistemas distribuídos de larga escala o custo da operação
de merge é negligenciável quando comparado com a sincronização global. Portanto, migramos
o conceito desenvolvendo os já existentes Mergeable Data Type através da criação
de uma formalização de um modelo de programação ao qual chamamos de Global-Local
View. Além do mais, criamos um portfolio de MDTs e demonstramos que nos cenários
apropriados podemos beneficiar largamente do modelo
Approximation Algorithms for Polynomial-Expansion and Low-Density Graphs
We study the family of intersection graphs of low density objects in low
dimensional Euclidean space. This family is quite general, and includes planar
graphs. We prove that such graphs have small separators. Next, we present
efficient -approximation algorithms for these graphs, for
Independent Set, Set Cover, and Dominating Set problems, among others. We also
prove corresponding hardness of approximation for some of these optimization
problems, providing a characterization of their intractability in terms of
density
A Canonical Form for PROV Documents and its Application to Equality, Signature, and Validation
We present a canonical form for
prov
that is a normalized way of representing
prov
documents as mathematical expressions. As opposed to the normal form specified by the
prov-constraints
recommendation, the canonical form we present is defined for all
prov
documents, irrespective of their validity, and it can be serialized in a unique way. The article makes the case for a canonical form for
prov
and its potential uses, namely comparison of
prov
documents in different formats, validation, and signature of
prov
documents. A signature of a
prov
document allows the integrity and the author of provenance to be ascertained; since the signature is based on the canonical form, these checks are not tied to a particular encoding, but can be performed on any representation of
prov
.
</jats:p
Spherical metrics with conical singularities on a 2-sphere: angle constraints
In this article we give a criterion for the existence of a metric of
curvature on a -sphere with conical singularities of prescribed
angles and non-coaxial holonomy. Such a
necessary and sufficient condition is expressed in terms of linear inequalities
in .Comment: 38 pages, 17 figure
Decompressing Lempel-Ziv Compressed Text
We consider the problem of decompressing the Lempel--Ziv 77 representation of
a string of length using a working space as close as possible to the
size of the input. The folklore solution for the problem runs in
time but requires random access to the whole decompressed text. Another
folklore solution is to convert LZ77 into a grammar of size and
then stream in linear time. In this paper, we show that time and
working space can be achieved for constant-size alphabets. On general
alphabets of size , we describe (i) a trade-off achieving
time and space for any
, and (ii) a solution achieving time and
space. The latter solution, in particular, dominates both
folklore algorithms for the problem. Our solutions can, more generally, extract
any specified subsequence of with little overheads on top of the linear
running time and working space. As an immediate corollary, we show that our
techniques yield improved results for pattern matching problems on
LZ77-compressed text
Merging Queries in OLTP Workloads
OLTP applications are usually executed by a high number of clients in parallel and are typically faced with high throughput demand as well as a constraint latency requirement for individual statements. In enterprise scenarios, they often face the challenge to deal with overload spikes resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such spikes is to use a significant over-provisioning of the underlying infrastructure. In this thesis, we analyze real enterprise OLTP workloads with respect to statement types, complexity, and hot-spot statements. Interestingly, our findings reveal that workloads are often read-heavy and comprise similar query patterns, which provides a potential to share work of statements belonging to different transactions. In the past, resource sharing has been extensively studied for OLAP workloads. Naturally, the question arises, why studies mainly focus on OLAP and not on OLTP workloads?
At first sight, OLTP queries often consist of simple calculations, such as index look-ups with little sharing potential. In consequence, such queries – due to their short execution time – may not have enough potential for the additional overhead. In addition, OLTP workloads do not only execute read operations but also updates. Therefore, sharing work needs to obey transactional semantics, such as the given isolation level and read-your-own-writes.
This thesis presents THE LEVIATHAN, a novel batching scheme for OLTP workloads, an approach for merging read statements within interactively submitted multi-statement transactions consisting of reads and updates. Our main idea is to merge the execution of statements by merging their plans, thus being able to merge the execution of not only complex, but also simple calculations, such as the aforementioned index look-up. We identify mergeable statements by pattern matching of prepared statement plans, which comes with low overhead. For obeying the isolation level properties and providing read-your-own-writes, we first define a formal framework for merging transactions running under a given isolation level and provide insights into a prototypical implementation of merging within a commercial database system.
Our experimental evaluation shows that, depending on the isolation level, the load in the system, and the read-share of the workload, an improvement of the transaction throughput by up to a factor of 2.5x is possible without compromising the transactional semantics. Another interesting effect we show is that with our strategy, we can increase the throughput of a real enterprise workload by 20%.:1 INTRODUCTION
1.1 Summary of Contributions
1.2 Outline
2 WORKLOAD ANALYSIS
2.1 Analyzing OLTP Benchmarks
2.1.1 YCSB
2.1.2 TATP
2.1.3 TPC Benchmark Scenarios
2.1.4 Summary
2.2 Analyzing OLTP Workloads from Open Source Projects
2.2.1 Characteristics of Workloads
2.2.2 Summary
2.3 Analyzing Enterprise OLTP Workloads
2.3.1 Overview of Reports about OLTP Workload Characteristics
2.3.2 Analysis of SAP Hybris Workload
2.3.3 Summary
2.4 Conclusion
3 RELATED WORK ON QUERY MERGING
3.1 Merging the Execution of Operators
3.2 Merging the Execution of Subplans
3.3 Merging the Results of Subplans
3.4 Merging the Execution of Full Plans
3.5 Miscellaneous Works on Merging
3.6 Discussion
4 MERGING STATEMENTS IN MULTI STATEMENT TRANSACTIONS
4.1 Overview of Our Approach
4.1.1 Examples
4.1.2 Why Naïve Merging Fails
4.2 THE LEVIATHAN Approach
4.3 Formalizing THE LEVIATHAN Approach
4.3.1 Transaction Theory
4.3.2 Merging Under MVCC
4.4 Merging Reads Under Different Isolation Levels
4.4.1 Read Uncommitted
4.4.2 Read Committed
4.4.3 Repeatable Read
4.4.4 Snapshot Isolation
4.4.5 Serializable
4.4.6 Discussion
4.5 Merging Writes Under Different Isolation Levels
4.5.1 Read Uncommitted
4.5.2 Read Committed
4.5.3 Snapshot Isolation
4.5.4 Serializable
4.5.5 Handling Dependencies
4.5.6 Discussion
5 SYSTEM MODEL
5.1 Definition of the Term “Overload”
5.2 Basic Queuing Model
5.2.1 Option (1): Replacement with a Merger Thread
5.2.2 Option (2): Adding Merger Thread
5.2.3 Using Multiple Merger Threads
5.2.4 Evaluation
5.3 Extended Queue Model
5.3.1 Option (1): Replacement with a Merger Thread
5.3.2 Option (2): Adding Merger Thread
5.3.3 Evaluation
6 IMPLEMENTATION
6.1 Background: SAP HANA
6.2 System Design
6.2.1 Read Committed
6.2.2 Snapshot Isolation
6.3 Merger Component
6.3.1 Overview
6.3.2 Dequeuing
6.3.3 Merging
6.3.4 Sending
6.3.5 Updating MTx State
6.4 Challenges in the Implementation of Merging Writes
6.4.1 SQL String Implementation
6.4.2 Update Count
6.4.3 Error Propagation
6.4.4 Abort and Rollback
7 EVALUATION
7.1 Benchmark Settings
7.2 System Settings
7.2.1 Experiment I: End-to-end Response Time Within a SAP Hybris System
7.2.2 Experiment II: Dequeuing Strategy
7.2.3 Experiment III: Merging Improvement on Different Statement, Transaction and Workload Types
7.2.4 Experiment IV: End-to-End Latency in YCSB
7.2.5 Experiment V: Breakdown of Execution in YCSB
7.2.6 Discussion of System Settings
7.3 Merging in Interactive Transactions
7.3.1 Experiment VI: Merging TATP in Read Uncommitted
7.3.2 Experiment VII: Merging TATP in Read Committed
7.3.3 Experiment VIII: Merging TATP in Snapshot Isolation
7.4 Merging Queries in Stored Procedures
Experiment IX: Merging TATP Stored Procedures in Read Committed
7.5 Merging SAP Hybris
7.5.1 Experiment X: CPU-time Breakdown on HANA Components
7.5.2 Experiment XI: Merging Media Query in SAP Hybris
7.5.3 Discussion of our Results in Comparison with Related Work
8 CONCLUSION
8.1 Summary
8.2 Future Research Directions
REFERENCES
A UML CLASS DIAGRAM
- …