46 research outputs found

    A Node-Positioning Algorithm for General Trees

    Get PDF
    Abstract Keywords Drawing a tree consists of two stages: determining the position of each node, and actually rendering the individuals nodes and interconnecting branches. The algorithm described in this paper is concerned with the first stage: given a list of nodes. an indication of the hierarchical relationship among them. and their shape and size, wrere should each node be positioned for optimal aesthetic effect? This algorithm determines the positions of the nodes for any arbitrary general tree. It is the most desirable positioning with respect to certain widelyaccepted heuristics. The positioning, specified in x, y coordinates, minimizes the width of the tree. In a general tree, there is no limit on the number of offspring per node; this contrasts with binary and ternary trees, for example, which are trees with a limit of 2 and 3 offspring per node. This algorithm operates in time O(N), where N is the number of nodes in the tree. Previously, most tree drawings have been positioned by the sure hand of a human graphic designer. Many computer-generated positionings have been either trivial or contained irregularities. Earlier work by Wetherell and Shannon (1979) and Tilford (1981), upon which this algorithm builds, failed to correctly position the interior nodes of some trees. Radack (1988), also building on Tilford's work, has solved this same problem with a different method which makes four passes. The algorithm presented here correctly positions a tree's nodes using only two passes. It also handles several practical considerations: alternate orientations of the tree, variable node sizes, and out-of-bounds condi- tions. Aesthetics, computer graphics, drawing methods, tree drawing, tree structures Abstract ii Content

    Space-Efficient Data Structures for Collections of Textual Data

    Get PDF
    This thesis focuses on the design of succinct and compressed data structures for collections of string-based data, specifically sequences of semi-structured documents in textual format, sets of strings, and sequences of strings. The study of such collections is motivated by a large number of applications both in theory and practice. For textual semi-structured data, we introduce the concept of semi-index, a succinct construction that speeds up the access to documents encoded with textual semi-structured formats, such as JSON and XML, by storing separately a compact description of their parse trees, hence avoiding the need to re-parse the documents every time they are read. For string dictionaries, we describe a data structure based on a path decomposition of the compacted trie built on the string set. The tree topology is encoded using succinct data structures, while the node labels are compressed using a simple dictionary-based scheme. We also describe a variant of the path-decomposed trie for scored string sets, where each string has a score. This data structure can support efficiently top-k completion queries, that is, given a string p and an integer k, return the k highest scored strings among those prefixed by p. For sequences of strings, we introduce the problem of compressed indexed sequences of strings, that is, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while supporting supports random access, searching, and counting operations, both for exact matches and prefix search. We present a new data structure, the Wavelet Trie, that solves the problem by combining a Patricia trie with a wavelet tree. The Wavelet Trie improves on the state-of-the-art compressed data structures for sequences by supporting a dynamic alphabet and prefix queries. Finally, we discuss the issue of the practical implementation of the succinct primitives used throughout the thesis for the experiments. These primitives are implemented as part of a publicly available library, Succinct, using state-of-the-art algorithms along with some improvements

    Throughput and energy efficiency of lock-free data structures: Execution Models and Analyses

    Get PDF
    Concurrent data structures are key program components to harness the available parallelism in multi-core processors. Lock-free algorithmic implementations of concurrent data structures offer high scalability and possess desirable properties such as immunity to deadlocks, convoying and priority inversion. In this thesis, we develop analytical tools to model and analyze the throughput and energy consumption of concurrent lock-free data structures. We start our study with a general class of lock-free data structures. Then, we target more specialized designs for lock-free queues. Finally, we focus on the search data structures that possess different characteristics compared to previously mentioned data structures. Performance of lock-free data structures: This thesis contributes to the problem of making ends meet between theoretical bounds and actual measured throughput. As the first step, we consider a general class of lock-free data structures and propose three analytical frameworks with different flavors. Analyses of this class also cover efficient implementations of a set of fundamental data structures that suffer from inherent sequential bottlenecks. We model the executions and examine the impact of contention on the throughput of these algorithms. Our analyses lead to optimization methods on memory management and back-off strategies. Performance and energy efficiency of lock-free queues: We take a step further to model the throughput of lock-free operations and their interaction. Considering shared queues, as a key paradigm for data sharing, operations (En- queue, Dequeue) access the opposite ends of a queue. Same type of operations might contend with each other on a non-empty queue. However, all types of operations are subject to interaction when the queue is empty. We first decorrelate the throughput of dequeuers’ and enqueuers’ into several uncorrelated basic throughputs, and reconstruct the main throughputs as a function of these basic throughputs. Besides, we model the power dissipation and integrate it with the throughput estimations to extract the energy consumption of applications that utilize lock-free queues. Performance of lock-free search data structures: Lock-free designs that utilize fine-grained synchronization have produced efficient implementations of search data structures. These designs reveal different characteristics compared to the previous set of lock-free data structures with inherent sequential bottlenecks. We introduce a new way of modeling and analyzing the throughput of search data structures under stationary and memoryless access patterns.

    Synthesizing stream control

    Get PDF
    For the management of reactive systems, controllers must coordinate time, data streams, and data transformations, all joint by the high level perspective of their control flow. This control flow is required to drive the system correctly and continuously, which turns the development into a challenge. The process is error-prone, time consuming, unintuitive, and costly. An attractive alternative is to synthesize the system instead, where the developer only needs to specify the desired behavior. The synthesis engine then automatically takes care of all the technical details. However, while current algorithms for the synthesis of reactive systems are well-suited to handle control, they fail on complex data transformations due to the complexity of the comparably large data space. Thus, to overcome the challenge of explicitly handling the data we must separate data and control. We introduce Temporal Stream Logic (TSL), a logic which exclusively argues about the control of the controller, while treating data and functional transformations as interchangeable black-boxes. In TSL it is possible to specify control flow properties independently of the complexity of the handled data. Furthermore, with TSL at hand a synthesis engine can check for realizability, even without a concrete implementation of the data transformations. We present a modular development framework that first uses synthesis to identify the high level control flow of a program. If successful, the created control flow then is extended with concrete data transformations in order to be compiled into a final executable. Our results also show that the current synthesis approaches cannot replace existing manual development work flows immediately. During the development of a reactive system, the developer still may use incomplete or faulty specifications at first, that need the be refined after a subsequent inspection. In the worst case, constraints are contradictory or miss important assumptions, which leads to unrealizable specifications. In both scenarios, the developer needs additional feedback from the synthesis engine to debug errors for finally improving the system specification. To this end, we explore two further possible improvements. On the one hand, we consider output sensitive synthesis metrics, which allow to synthesize simple and well structured solutions that help the developer to understand and verify the underlying behavior quickly. On the other hand, we consider the extension of delay, whose requirement is a frequent reason for unrealizability. With both methods at hand, we resolve the aforementioned problems and therefore help the developer in the development phase with the effective creation of a safe and correct reactive system.Um reaktive Systeme zu regeln mĂŒssen SteuergerĂ€te Zeit, Datenströme und Datentransformationen koordinieren, die durch den ĂŒbergeordneten Kontrollfluss zusammengefasst werden. Die Aufgabe des Kontrollflusses ist es das System korrekt und dauerhaft zu betreiben. Die Entwicklung solcher Systeme wird dadurch zu einer Herausforderung, denn der Prozess ist fehleranfĂ€llig, zeitraubend, unintuitiv und kostspielig. Eine attraktive Alternative ist es stattdessen das System zu synthetisieren, wobei der Entwickler nur das gewĂŒnschte Verhalten des Systems festlegt. Der Syntheseapparat kĂŒmmert sich dann automatisch um alle technischen Details. WĂ€hrend aktuelle Algorithmen fĂŒr die Synthese von reaktiven Systemen erfolgreich mit dem Kontrollanteil umgehen können, versagen sie jedoch, sobald komplexe Datentransformationen hinzukommen, aufgrund der KomplexitĂ€t des vergleichsweise großen Datenraums. Daten und Kontrolle mĂŒssen demnach getrennt behandelt werden, um auch große DatenrĂ€umen effizient handhaben zu können. Wir prĂ€sentieren Temporal Stream Logic (TSL), eine Logik die ausschließlich die Kontrolle einer Steuerung betrachtet, wohingegen Daten und funktionale Datentransformationen als austauschbare Blackboxen gehandhabt werden. In TSL ist es möglich Kontrollflusseigenschaften unabhĂ€ngig von der KomplexitĂ€t der zugrunde liegenden Daten zu beschreiben. Des Weiteren kann ein auf TSL beruhender Syntheseapparat die Realisierbarkeit einer Spezifikation prĂŒfen, selbst ohne die konkreten Implementierungen der Datentransformationen zu kennen. Wir prĂ€sentieren ein modulares GrundgerĂŒst fĂŒr die Entwicklung. Es verwendet zunĂ€chst den Syntheseapparat um den ĂŒbergeordneten Kontrollfluss zu erzeugen. Ist dies erfolgreich, so wird der resultierende Kontrollfluss um die konkreten Implementierungen der Datentransformationen erweitert und anschließend zu einer ausfĂŒhrbare Anwendung kompiliert. Wir zeigen auch auf, dass bisherige Syntheseverfahren bereits existierende manuelle Entwicklungsprozesse noch nicht instantan ersetzen können. Im Verlauf der Entwicklung ist es auch weiterhin möglich, dass der Entwickler zunĂ€chst unvollstĂ€ndige oder fehlerhafte Spezifikationen erstellt, welche dann erst nach genauerer Betrachtung des synthetisierten Systems weiter verbessert werden können. Im schlimmsten Fall sind Anforderungen inkonsistent oder wichtige Annahmen ĂŒber das Verhalten fehlen, was zu unrealisierbaren Spezifikationen fĂŒhrt. In beiden FĂ€llen benötigt der Entwickler zusĂ€tzliche RĂŒckmeldungen vom Syntheseapparat, um Fehler zu identifizieren und die Spezifikation schlussendlich zu verbessern. In diesem Zusammenhang untersuchen wir zwei mögliche Erweiterungen. Zum einen betrachten wir ausgabeabhĂ€ngige Metriken, die es dem Entwickler erlauben einfache und wohlstrukturierte Lösungen zu synthetisieren die verstĂ€ndlich sind und deren Verhalten einfach zu verifizieren ist. Zum anderen betrachten wir die Erweiterung um Verzögerungen, welche eine der Hauptursachen fĂŒr Unrealisierbarkeit darstellen. Mit beiden Methoden beheben wir die jeweils zuvor genannten Probleme und helfen damit dem Entwickler wĂ€hrend der Entwicklungsphase auch wirklich das reaktive System zu kreieren, dass er sich auch tatsĂ€chlich vorstellt

    The Automation of Glycopeptide Discovery in High Throughput MS/MS Data

    Get PDF
    Glycosylation, the addition of one or more carbohydrates molecules to a protein, is crucial for many cellular processes. Aberrant glycosylation is a key marker for various diseases such as cancer and rheumatoid arthritis. It has also recently been discovered that glycosylation is important in the ability of the Human Immunodeficiency Virus (HIV) to evade recognition by the immune system. Given the importance of glycosylation in disease, major efforts are underway in life science research to investigate the glycome, the entire glycosylation profile of an organelle, cell or tissue type. To date, little bioinformatics research has been performed in glycomics due to the complexity of glycan structures and the low throughput of carbohydrate analysis. Recent advances in mass spectrometry (MS) have greatly facilitated the analysis of the glycome. Increasingly, this technology is preferred over traditional methods of carbohydrate analysis which are often laborious and unsuitable for low abundance glycoproteins. When subject to mass spectrometry with collision-induced dissociation, glycopeptides produce characteristic MS/MS spectra that can be detected by visual inspection. However, given the high volume of data output from proteome studies today, manually searching for glycopeptides is an impractical task. In this thesis, we present a tool to automate the identification of glycopeptide spectra from MS/MS data. Further, we discuss some methodologies to automate the elucidation of the structure of the carbohydrate moiety of glycopeptides by adapting traditional MS/MS ion searching techniques employed in peptide sequence determination. MS/MS ion searching, a common technique in proteomics, aims to interpret MS/MS spectra by correlating structures from a database to the patterns represented in the spectrum. The tool was tested on high throughput proteomics data and was shown to identify 97% of all glycopeptides present in the test data. Further, the tool assigned correct carbohydrate structures to many of these glycopeptide MS/MS spectra. Applications of the tool in a proteomics environment for the analysis of glycopeptide expression in cancer tissue are also be presented

    Compression methods for graph algorithms

    Get PDF
    Two compression methods for representing graphs are presented, in conjunction with algorithms applying these methods. A decomposition technique for networks that can be generated in O(m) time is presented. The components of the decomposition and the shortest path matrix of the compressed network can be used to find the shortest path between any pair of vertices in the original network in linear time. A compression method for boolean matrices and a method for applying the compression to boolean matrix multiplication is developed. The algorithms have an expected running time of O(nÂČ*log ₂n). From this compression method a simple heuristic that may be applied to any algorithm for boolean matrix multiplication has been developed. This heuristic will improve the average running time of boolean matrix multiplication algorithms. An order of magnitude analysis of the results published by Loukakis and Tsouris [1981], on the efficiency of algorithms for finding all maximal independent sets of a graph has been performed. This analysis showed that their conclusions, which are based on a direct comparison of the running times of the algorithms, do not take into account implementation factors. An average constant factor improvement is developed for the algorithm of Tsukiyama, Ide, Ariyoshi and Shirakawa [1977] for finding all maximal independent sets of a graph. Analysis of the running time results from the algorithm comparisons presented in this thesis show that the Bron-Kerbosch algorithm has the smallest rate of increase in running time as the size of the graphs increase

    Structured layout design

    Get PDF
    corecore