241 research outputs found

    Linear tail-biting trellises: Characteristic generators and the BCJR-construction

    Full text link
    We investigate the constructions of tail-biting trellises for linear block codes introduced by Koetter/Vardy (2003) and Nori/Shankar (2006). For a given code we will define the sets of characteristic generators more generally than by Koetter/Vardy and we will investigate how the choice of characteristic generators affects the set of resulting product trellises, called KV-trellises. Furthermore, we will show that each KV-trellis is a BCJR-trellis, defined in a slightly stronger sense than by Nori/Shankar, and that the latter are always non-mergeable. Finally, we will address a duality conjecture of Koetter/Vardy by making use of a dualization technique of BCJR-trellises and prove the conjecture for minimal trellises.Comment: 28 page

    Spectral Methods for Learning Multivariate Latent Tree Structure

    Full text link
    This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics

    Convergent types for shared memory

    Get PDF
    Dissertação de mestrado em Computer ScienceIt is well-known that consistency in shared memory concurrent programming comes with the price of degrading performance and scalability. Some of the existing solutions to this problem end up with high-level complexity and are not programmer friendly. We present a simple and well-defined approach to obtain relevant results for shared memory environments through relaxing synchronization. For that, we will look into Mergeable Data Types, data structures analogous to Conflict-Free Replicated Data Types but designed to perform in shared memory. CRDTs were the first formal approach engaging a solid theoretical study about eventual consistency on distributed systems, answering the CAP Theorem problem and providing high-availability. With CRDTs, updates are unsynchronized, and replicas eventually converge to a correct common state. However, CRDTs are not designed to perform in shared memory. In large-scale distributed systems the merge cost is negligible when compared to network mediated synchronization. Therefore, we have migrated the concept by developing the already existent Mergeable Data Types through formally defining a programming model that we named Global-Local View. Furthermore, we have created a portfolio of MDTs and demonstrated that in the appropriated scenarios we can largely benefit from the model.É bem sabido que para garantir coerência em programas concorrentes num ambiente de memória partilhada sacrifica-se performance e escalabilidade. Alguns dos métodos existentes para garantirem resultados significativos introduzem uma elevada complexidade e não são práticos. O nosso objetivo é o de garantir uma abordagem simples e bem definida de alcançar resultados notáveis em ambientes de memória partilhada, quando comparados com os métodos existentes, relaxando a coerência. Para tal, vamos analisar o conceito de Mergeable Data Type, estruturas análogas aos Conflict-Free Replicated Data Types mas concebidas para memória partilhada. CRDTs foram a primeira abordagem a desenvolver um estudo formal sobre eventual consistency, respondendo ao problema descrito no CAP Theorem e garantindo elevada disponibilidade. Com CRDTs os updates não são síncronos e as réplicas convergem eventualmente para um estado correto e comum. No entanto, não foram concebidos para atuar em memória partilhada. Em sistemas distribuídos de larga escala o custo da operação de merge é negligenciável quando comparado com a sincronização global. Portanto, migramos o conceito desenvolvendo os já existentes Mergeable Data Type através da criação de uma formalização de um modelo de programação ao qual chamamos de Global-Local View. Além do mais, criamos um portfolio de MDTs e demonstramos que nos cenários apropriados podemos beneficiar largamente do modelo

    Approximation Algorithms for Polynomial-Expansion and Low-Density Graphs

    Full text link
    We study the family of intersection graphs of low density objects in low dimensional Euclidean space. This family is quite general, and includes planar graphs. We prove that such graphs have small separators. Next, we present efficient (1+ε)(1+\varepsilon)-approximation algorithms for these graphs, for Independent Set, Set Cover, and Dominating Set problems, among others. We also prove corresponding hardness of approximation for some of these optimization problems, providing a characterization of their intractability in terms of density

    A Canonical Form for PROV Documents and its Application to Equality, Signature, and Validation

    Get PDF
    We present a canonical form for prov that is a normalized way of representing prov documents as mathematical expressions. As opposed to the normal form specified by the prov-constraints recommendation, the canonical form we present is defined for all prov documents, irrespective of their validity, and it can be serialized in a unique way. The article makes the case for a canonical form for prov and its potential uses, namely comparison of prov documents in different formats, validation, and signature of prov documents. A signature of a prov document allows the integrity and the author of provenance to be ascertained; since the signature is based on the canonical form, these checks are not tied to a particular encoding, but can be performed on any representation of prov . </jats:p

    Spherical metrics with conical singularities on a 2-sphere: angle constraints

    Get PDF
    In this article we give a criterion for the existence of a metric of curvature 11 on a 22-sphere with nn conical singularities of prescribed angles 2πϑ1,,2πϑn2\pi\vartheta_1,\dots,2\pi\vartheta_n and non-coaxial holonomy. Such a necessary and sufficient condition is expressed in terms of linear inequalities in ϑ1,,ϑn\vartheta_1,\dots,\vartheta_n.Comment: 38 pages, 17 figure

    Decompressing Lempel-Ziv Compressed Text

    Full text link
    We consider the problem of decompressing the Lempel--Ziv 77 representation of a string SS of length nn using a working space as close as possible to the size zz of the input. The folklore solution for the problem runs in O(n)O(n) time but requires random access to the whole decompressed text. Another folklore solution is to convert LZ77 into a grammar of size O(zlog(n/z))O(z\log(n/z)) and then stream SS in linear time. In this paper, we show that O(n)O(n) time and O(z)O(z) working space can be achieved for constant-size alphabets. On general alphabets of size σ\sigma, we describe (i) a trade-off achieving O(nlogδσ)O(n\log^\delta \sigma) time and O(zlog1δσ)O(z\log^{1-\delta}\sigma) space for any 0δ10\leq \delta\leq 1, and (ii) a solution achieving O(n)O(n) time and O(zloglog(n/z))O(z\log\log (n/z)) space. The latter solution, in particular, dominates both folklore algorithms for the problem. Our solutions can, more generally, extract any specified subsequence of SS with little overheads on top of the linear running time and working space. As an immediate corollary, we show that our techniques yield improved results for pattern matching problems on LZ77-compressed text

    Merging Queries in OLTP Workloads

    Get PDF
    OLTP applications are usually executed by a high number of clients in parallel and are typically faced with high throughput demand as well as a constraint latency requirement for individual statements. In enterprise scenarios, they often face the challenge to deal with overload spikes resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such spikes is to use a significant over-provisioning of the underlying infrastructure. In this thesis, we analyze real enterprise OLTP workloads with respect to statement types, complexity, and hot-spot statements. Interestingly, our findings reveal that workloads are often read-heavy and comprise similar query patterns, which provides a potential to share work of statements belonging to different transactions. In the past, resource sharing has been extensively studied for OLAP workloads. Naturally, the question arises, why studies mainly focus on OLAP and not on OLTP workloads? At first sight, OLTP queries often consist of simple calculations, such as index look-ups with little sharing potential. In consequence, such queries – due to their short execution time – may not have enough potential for the additional overhead. In addition, OLTP workloads do not only execute read operations but also updates. Therefore, sharing work needs to obey transactional semantics, such as the given isolation level and read-your-own-writes. This thesis presents THE LEVIATHAN, a novel batching scheme for OLTP workloads, an approach for merging read statements within interactively submitted multi-statement transactions consisting of reads and updates. Our main idea is to merge the execution of statements by merging their plans, thus being able to merge the execution of not only complex, but also simple calculations, such as the aforementioned index look-up. We identify mergeable statements by pattern matching of prepared statement plans, which comes with low overhead. For obeying the isolation level properties and providing read-your-own-writes, we first define a formal framework for merging transactions running under a given isolation level and provide insights into a prototypical implementation of merging within a commercial database system. Our experimental evaluation shows that, depending on the isolation level, the load in the system, and the read-share of the workload, an improvement of the transaction throughput by up to a factor of 2.5x is possible without compromising the transactional semantics. Another interesting effect we show is that with our strategy, we can increase the throughput of a real enterprise workload by 20%.:1 INTRODUCTION 1.1 Summary of Contributions 1.2 Outline 2 WORKLOAD ANALYSIS 2.1 Analyzing OLTP Benchmarks 2.1.1 YCSB 2.1.2 TATP 2.1.3 TPC Benchmark Scenarios 2.1.4 Summary 2.2 Analyzing OLTP Workloads from Open Source Projects 2.2.1 Characteristics of Workloads 2.2.2 Summary 2.3 Analyzing Enterprise OLTP Workloads 2.3.1 Overview of Reports about OLTP Workload Characteristics 2.3.2 Analysis of SAP Hybris Workload 2.3.3 Summary 2.4 Conclusion 3 RELATED WORK ON QUERY MERGING 3.1 Merging the Execution of Operators 3.2 Merging the Execution of Subplans 3.3 Merging the Results of Subplans 3.4 Merging the Execution of Full Plans 3.5 Miscellaneous Works on Merging 3.6 Discussion 4 MERGING STATEMENTS IN MULTI STATEMENT TRANSACTIONS 4.1 Overview of Our Approach 4.1.1 Examples 4.1.2 Why Naïve Merging Fails 4.2 THE LEVIATHAN Approach 4.3 Formalizing THE LEVIATHAN Approach 4.3.1 Transaction Theory 4.3.2 Merging Under MVCC 4.4 Merging Reads Under Different Isolation Levels 4.4.1 Read Uncommitted 4.4.2 Read Committed 4.4.3 Repeatable Read 4.4.4 Snapshot Isolation 4.4.5 Serializable 4.4.6 Discussion 4.5 Merging Writes Under Different Isolation Levels 4.5.1 Read Uncommitted 4.5.2 Read Committed 4.5.3 Snapshot Isolation 4.5.4 Serializable 4.5.5 Handling Dependencies 4.5.6 Discussion 5 SYSTEM MODEL 5.1 Definition of the Term “Overload” 5.2 Basic Queuing Model 5.2.1 Option (1): Replacement with a Merger Thread 5.2.2 Option (2): Adding Merger Thread 5.2.3 Using Multiple Merger Threads 5.2.4 Evaluation 5.3 Extended Queue Model 5.3.1 Option (1): Replacement with a Merger Thread 5.3.2 Option (2): Adding Merger Thread 5.3.3 Evaluation 6 IMPLEMENTATION 6.1 Background: SAP HANA 6.2 System Design 6.2.1 Read Committed 6.2.2 Snapshot Isolation 6.3 Merger Component 6.3.1 Overview 6.3.2 Dequeuing 6.3.3 Merging 6.3.4 Sending 6.3.5 Updating MTx State 6.4 Challenges in the Implementation of Merging Writes 6.4.1 SQL String Implementation 6.4.2 Update Count 6.4.3 Error Propagation 6.4.4 Abort and Rollback 7 EVALUATION 7.1 Benchmark Settings 7.2 System Settings 7.2.1 Experiment I: End-to-end Response Time Within a SAP Hybris System 7.2.2 Experiment II: Dequeuing Strategy 7.2.3 Experiment III: Merging Improvement on Different Statement, Transaction and Workload Types 7.2.4 Experiment IV: End-to-End Latency in YCSB 7.2.5 Experiment V: Breakdown of Execution in YCSB 7.2.6 Discussion of System Settings 7.3 Merging in Interactive Transactions 7.3.1 Experiment VI: Merging TATP in Read Uncommitted 7.3.2 Experiment VII: Merging TATP in Read Committed 7.3.3 Experiment VIII: Merging TATP in Snapshot Isolation 7.4 Merging Queries in Stored Procedures Experiment IX: Merging TATP Stored Procedures in Read Committed 7.5 Merging SAP Hybris 7.5.1 Experiment X: CPU-time Breakdown on HANA Components 7.5.2 Experiment XI: Merging Media Query in SAP Hybris 7.5.3 Discussion of our Results in Comparison with Related Work 8 CONCLUSION 8.1 Summary 8.2 Future Research Directions REFERENCES A UML CLASS DIAGRAM
    corecore