326 research outputs found

    HPQS: A fast, high-capacity, hybrid priority queuing system for high-speed networking devices

    Get PDF
    In this paper, we present a fast hybrid priority queue architecture intended for scheduling and prioritizing packets in a network data plane. Due to increasing traffic and tight requirements of high-speed networking devices, a high capacity priority queue, with constant latency and guaranteed performance is needed. We aim at reducing latency to best support the upcoming 5G wireless standards. The proposed hybrid priority queuing system (HPQS) enables pipelined queue operations with almost constant time complexity in practice. The proposed architecture is implemented in C++, and is synthesized with the Vivado High-Level Synthesis (HLS) tool. Two configurations are proposed. The first one is intended for scheduling with a multi-queuing system for which implementation results of 64 up to 512 independent queues are reported. The second configuration is intended for large capacity priority queues, that are placed and routed on a ZC706 board and a XCVU440-FLGB2377-3-E Xilinx FPGA supporting a total capacity of 1/2 million packet tags. The reported results are compared across a range of priority queue depths and performance metrics with existing approaches. The proposed HPQS supports links operating at 40 Gb/s

    Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

    Full text link
    This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time O(logn)O(\log n) per input symbol (as opposed to amortized O(logn)O(\log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(logn)O(\log n) worst case time per input symbol. Searching for a pattern of length mm in the resulting suffix tree takes O(min(mlogΣ,m+logn)+tocc)O(\min(m\log |\Sigma|, m + \log n) + tocc) time, where tocctocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

    String Searching with Ranking Constraints and Uncertainty

    Get PDF
    Strings play an important role in many areas of computer science. Searching pattern in a string or string collection is one of the most classic problems. Different variations of this problem such as document retrieval, ranked document retrieval, dictionary matching has been well studied. Enormous growth of internet, large genomic projects, sensor networks, digital libraries necessitates not just efficient algorithms and data structures for the general string indexing, but indexes for texts with fuzzy information and support for queries with different constraints. This dissertation addresses some of these problems and proposes indexing solutions. One such variation is document retrieval query for included and excluded/forbidden patterns, where the objective is to retrieve all the relevant documents that contains the included patterns and does not contain the excluded patterns. We continue the previous work done on this problem and propose more efficient solution. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. The forbidden pattern problem suffers from the drawback that linear space (in words) solutions are unlikely to yield a solution better than O(root(n/occ)) per document reporting time, where n is the total length of the documents and occ is the number of output documents. Continuing this path, we introduce a new variation, namely document retrieval with forbidden extension query, where the forbidden pattern is an extension of the included pattern.We also address the more general top-k version of the problem, which retrieves the top k documents, where the ranking is based on PageRank relevance metric. This problem finds motivation from search applications. It also holds theoretical interest as we show that the hardness of forbidden pattern problem is alleviated in this problem. We achieve linear space and optimal query time for this variation. We also propose succinct indexes for both these problems. Position restricted pattern matching considers the scenario where only part of the text is searched. We propose succinct index for this problem with efficient query time. An important application for this problem stems from searching in genomic sequences, where only part of the gene sequence is searched for interesting patterns. The problem of computing discriminating(resp. generic) words is to report all minimal(resp. maximal) extensions of a query pattern which are contained in at most(resp. at least) a given number of documents. These problems are motivated from applications in computational biology, text mining and automated text classification. We propose succinct indexes for these problems. Strings with uncertainty and fuzzy information play an important role in increasingly many applications. We propose a general framework for indexing uncertain strings such that a deterministic query string can be searched efficiently. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We consider two basic problems of string searching, namely substring searching and string listing. We formulate these well known problems for uncertain strings paradigm and propose exact and approximate solution for them. We also discuss a constrained variation of orthogonal range searching. Given a set of points, the task of orthogonal range searching is to build a data structure such that all the points inside a orthogonal query region can be reported. We introduce a new variation, namely shared constraint range searching which naturally arises in constrained pattern matching applications. Shared constraint range searching is a special four sided range reporting query problem where two constraints has sharing among them, effectively reducing the number of independent constraints. For this problem, we propose a linear space index that can match the best known bound for three dimensional dominance reporting problem. We extend our data structure in the external memory model

    FPGA Based Binary Heap Implementation: With an Application to Web Based Anomaly Prioritization

    Get PDF
    This thesis is devoted to the investigation of prioritization mechanism for web based anomaly detection. We propose a hardware realization of parallel binary heap as an application of web based anomaly prioritization. The heap is implemented in pipelined fashion in FPGA platform. The propose design takes O(1) time for all operations by ensuring minimum waiting time between two consecutive operations. We present the various design issues and hardware complexity. We explicitly analyze the design trade-offs of the proposed priority queue implementations

    Efficient Data Structures for Partial Orders, Range Modes, and Graph Cuts

    Get PDF
    This thesis considers the study of data structures from the perspective of the theoretician, with a focus on simplicity and practicality. We consider both the time complexity as well as space usage of proposed solutions. Topics discussed fall in three main categories: partial order representation, range modes, and graph cuts. We consider two problems in partial order representation. The first is a data structure to represent a lattice. A lattice is a partial order where the set of elements larger than any two elements x and y are all larger than an element z, known as the join of x and y; a similar condition holds for elements smaller than any two elements. Our data structure is the first correct solution that can simultaneously compute joins and the inverse meet operation in sublinear time while also using subquadratic space. The second is a data structure to support queries on a dynamic set of one-dimensional ordered data; that is, essentially any operation computable on a binary search tree. We develop a data structure that is able to interpolate between binary search trees and efficient priority queues, offering more-efficient insertion times than the former when query distribution is non-uniform. We also consider static and dynamic exact and approximate range mode. Given one-dimensional data, the range mode problem is to compute the mode of a subinterval of the data. In the dynamic range mode problem, insertions and deletions are permitted. For the approximate problem, the element returned is to have frequency no less than a factor (1+epsilon) of the true mode, for some epsilon > 0. Our results include a linear-space dynamic exact range mode data structure that simultaneously improves on best previous operation complexity and an exact dynamic range mode data structure that breaks the Theta(n^(2/3)) time per operation barrier. For approximate range mode, we develop a static succinct data structure offering a logarithmic-factor space improvement and give the first dynamic approximate range mode data structure. We also consider approximate range selection. The final category discussed is graph and dynamic graph algorithms. We develop an optimal offline data structure for dynamic 2- and 3- edge and vertex connectivity. Here, the data structure is given the entire sequence of operations in advance, and the dynamic operations are edge insertion and removal. Finally, we give a simplification of Karger's near-linear time minimum cut algorithm, utilizing heavy-light decomposition and iteration in place of dynamic programming in the subroutine to find a minimum cut of a graph G that cuts at most two edges of a spanning tree T of G

    Succinct and Self-Indexed Data Structures for the Exploitation and Representation of Moving Objects

    Get PDF
    Programa Oficial de Doutoramento en Computación . 5009V01[Abstract] This thesis deals with the efficient representation and exploitation of trajectories of objects that move in space without any type of restriction (airplanes, birds, boats, etc.). Currently, this is a very relevant problem due to the proliferation of GPS devices, which makes it possible to collect a large number of trajectories. However, until now there is no efficient way to properly store and exploit them. In this thesis, we propose eight structures that meet two fundamental objectives. First, they are capable of storing space-time data, describing the trajectories, in a reduced space, so that their exploitation takes advantage of the memory hierarchy. Second, those structures allow exploiting the information by object queries, given an object, they retrieve the position or trajectory of that object along that time; or space-time range queries, given a region of space and a time interval, the objects that are within the region at that time are obtained. It should be noted that state-of-the-art solutions are only capable of efficiently answering one of the two types of queries. All of these data structures have a common nexus, they all use two elements: snapshots and logs. Each snapshot works as a spatial index that periodically indexes the absolute position of each object or the Minimum Bounding Rectangle (MBR) of its trajectory. They serve to speed up the spatio-temporal range queries. We have implemented two types of snapshots: based on k2-trees or R-trees. With respect to the log, it represents the trajectory (sequence of movements) of each object. It is the main element of the structures, and facilitates the resolution of object and spatio-temporal range queries. Four strategies have been implemented to represent the log in a compressed form: ScdcCT, GraCT, ContaCT and RCT. With the combination of these two elements we build eight different structures for the representation of trajectories. All of them have been implemented and evaluated experimentally, showing that they reduce the space required by traditional methods by up to two orders of magnitude. Furthermore, they are all competitive in solving object queries as well as spatial-temporal ones.[Resumen] Esta tesis aborda la representación y explotación eficiente de trayectorias de objetos que se mueven en el espacio sin ningún tipo de restricción (aviones, pájaros, barcos, etc.). En la actualidad, este es un problema muy relevante debido a la proliferación de dispositivos GPS, lo que permite coleccionar una gran cantidad de trayectorias. Sin embargo, hasta ahora no existe un modo eficiente para almacenarlas y explotarlas adecuadamente. Esta tesis propone ocho estructuras que cumplen con dos objetivos fundamentales. En primer lugar, son capaces de almacenar en espacio reducido los datos espaciotemporales, que describen las trayectorias, de modo que su explotación saque partido a la jerarquía de memoria. En segundo lugar, las estructuras permiten explotar la información realizando consultas sobre objetos, dado el objeto se calcula su posición o trayectoria durante un intervalo de tiempo; o consultas de rango espacio-temporal, dada una región del espacio y un intervalo de tiempo se obtienen los objetos que estaban dentro de la región en ese tiempo. Hay que destacar que las soluciones del estado del arte solo son capaces de responder eficientemente uno de los dos tipos de consultas. Todas estas estructuras de datos tienen un nexo común, todas ellas usan dos elementos: snapshots y logs. Cada snapshot funciona como un índice espacial que periódicamente indexa la posición absoluta de cada objeto o el Minimum Bounding Rectangle (MBR) de su trayectoria. Sirven para agilizar las consultas de rango espacio-temporal. Hemos implementado dos tipos de snapshot: basadas en k2-trees o en R-trees. Con respecto al log, éste representa la trayectoria (secuencia de movimientos) de cada objeto. Es el principal elemento de nuestras estructuras, y facilita la resolución de consultas de objeto y de rango espacio-temporal. Se han implementado cuatro estrategias para representar el log de forma comprimida: ScdcCT, GraCT, ContaCT y RCT. Con la combinación de estos dos elementos construimos ocho estructuras diferentes para la representación de trayectorias. Todas ellas han sido implementadas y evaluadas experimentalmente, donde reducen hasta dos órdenes de magnitud el espacio que requieren los métodos tradicionales. Además, todas ellas son competitivas resolviendo tanto consultas de objeto como de rango espacio-temporal.[Resumo] Esta tese trata sobre a representación e explotación eficiente de traxectorias de obxectos que se moven no espazo sen ningún tipo de restrición (avións, paxaros, buques, etc.). Na actualidade, este é un problema moi relevante debido á proliferación de dispositivos GPS, o que fai posible a recollida dun gran número de traxectorias. Non obstante, ata o de agora non existe un xeito eficiente de almacenalos e explotalos. Esta tese propón oito estruturas que cumpren dous obxectivos fundamentais. En primeiro lugar, son capaces de almacenar datos espazo-temporais, que describen as traxectorias, nun espazo reducido, de xeito que a súa explotación aproveita a xerarquía da memoria. En segundo lugar, as estruturas permiten explotar a información realizando consultas de obxectos, dado o obxecto calcúlase a súa posición ou traxectoria nun período de tempo; ou consultas de rango espazo-temporal, dada unha rexión de espazo e un intervalo de tempo, obtéñense os obxectos que estaban dentro da rexión nese momento. Cómpre salientar que as solucións do estado do arte só son capaces de responder eficientemente a un dos dous tipos de consultas. Todas estas estruturas de datos teñen unha ligazón común, empregan dous elementos: snapshots e logs. Cada snapshot funciona como un índice espacial que indexa periodicamente a posición absoluta de cada obxecto ou o Minimum Bounding Rectangle (MBR) da súa traxectoria. Serven para acelerar as consultas de rango espazo-temporal. Implementamos dous tipos de snapshot: baseadas en k2-trees ou en R-trees. Con respecto ao log, este representa a traxectoria (secuencia de movementos) de cada obxecto. É o principal elemento das nosas estruturas, e facilita a resolución de consultas sobre obxectos e de rango espacio-temporal. Implementáronse catro estratexias para representar o log nunha forma comprimida: ScdcCT, GraCT, ContaCT e RCT. Coa combinación destes dous elementos construímos oito estruturas diferentes para a representación de traxectorias. Todas elas foron implementadas e avaliadas experimentalmente, onde reducen ata dúas ordes de magnitude o espazo requirido polos métodos tradicionais. Ademais, todas elas son competitivas para resolver tanto consultas de obxectos como espazo-temporais
    corecore