1,713 research outputs found

    An approximate search engine for structure

    Get PDF
    As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc. Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed. The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures

    Boyer-Moore strategy to efficient approximate string matching

    Get PDF
    International audienceWe propose a simple but e cient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size w, that is, m(⌈log2(k+1)⌉+1 )≤w. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2+(k+4)/(m-k))

    Hardware support for real-time network security and packet classification using field programmable gate arrays

    Get PDF
    Deep packet inspection and packet classification are the most computationally expensive operations in a Network Intrusion Detection (NID) system. Deep packet inspection involves content matching where the payload of the incoming packets is matched against a set of signatures in the database. Packet classification involves inspection of the packet header fields and is basically a multi-dimensional matching problem. Any matching in software is very slow in comparison to current network speeds. Also, both of these problems need a solution which is scalable and can work at high speeds. Due to the high complexity of these matching problems, only Field-Programmable Gate Array (FPGA) or Application-Specific Integrated Circuit (ASIC) platforms can facilitate efficient designs. Two novel FPGA-based NID solutions were developed and implemented that not only carry out pattern matching at high speed but also allow changes to the set of stored patterns without resource/hardware reconfiguration; to their advantage, the solutions can easily be adopted by software or ASIC approaches as well. In both solutions, the proposed NID system can run while pattern updates occur. The designs can operate at 2.4 Gbps line rates, and have a memory consumption of around 17 bits per character and a logic cell usage of around 0.05 logic cells per character, which are the smallest compared to any other existing FPGA-based solution. In addition to these solutions for pattern matching, a novel packet classification algorithm was developed and implemented on a FPGA. The method involves a two-field matching process at a time that then combines the constituent results to identify longer matches involving more header fields. The design can achieve a throughput larger than 9.72 Gbps and has an on-chip memory consumption of around 256Kbytes when dealing with more than 10,000 rules (without using external RAM). This memory consumption is the lowest among all the previously proposed FPGA-based designs for packet classification

    Parallelism in declarative languages

    Get PDF
    Imperative programming languages were initially built for uniprocessor systems that evolved out of the Von Neumann machine model. This model of storage oriented computation blocks parallelism and increases the cost of parallel program development and porting. Declarative languages based on mathematical models of computation, seem more suitable for the development of parallel programs. In the first part of this thesis we examine different language families under the declarative paradigm: functional, logic, and constraint languages. Functional languages are based on the abstract model of functions and (lamda)-calculus. They were initially developed for symbolic computation, but today they are commonly used in numerical analysis and many other application areas. Pure lisp is a widely known member of this class. Logic languages are based on first order predicate calculus. Although they were initially developed for theorem proving, fifth generation operating systems are written in them. Most logic languages are descendants or distant relatives of Prolog. Constraint languages are related to logic languages. In a constraint language you define a program object by placing constraints on its structure and its behavior. They were initially used in graphics applications, but today researchers work on using them in parallel computation. Here we will compare and contrast the language classes above, locate advantages and deficiencies, and explain different choices made by language implementors. In the second part of thesis we describe a front end for the CONSUL, a prototype constraint language for programming multiprocessors. The most important features of the front end are compact representation of constraints, type definitions, functional use of relations, and the ability to split programs into multiple files

    Functional object-types as a foundation of complex knowledge-based systems

    Get PDF

    Reliable file transfer across a 10 megabit ethernet

    Get PDF
    The Ethernet communications network is a broadcast, multi-access system for local computing networks. Such a network was used to connect six 68000 based Charles River Data Systems for the purpose of file transfer. Each system required hardware installation and connection to the Ethernet cable. The software is an implementation which conforms to Xerox PUP File Transfer Protocol Specifications . This required the writing of two programs, the FTP user and the FTP server. Each program was built upon common communication packages which also had to be written. These communication routines transferred data over the Ethernet using the PARC Universal Packets (PUP) format

    JPEG-like Image Compression using Neural-network-based Block Classification and Adaptive Reordering of Transform Coefficients

    Get PDF
    The research described in this thesis addresses aspects of coding of discrete-cosinetransform (DCT) coefficients, that are present in a variety of transform-based digital-image-compression schemes such as JPEG. Coefficient reordering; that directly affects the symbol statistics for entropy coding, and therefore the effectiveness of entropy coding; is investigated. Adaptive zigzag reordering, a novel versatile technique that achieves efficient reordering by processing variable-size rectangular sub-blocks of coefficients, is developed. Classification of blocks of DCT coefficients using an artificial neural network (ANN) prior to adaptive zigzag reordering is also considered. Some established digital-image-compression techniques are reviewed, and the JPEG standard for the DCT-based method is studied in more detail. An introduction to artificial neural networks is provided. Lossless conversion of blocks of coefficients using adaptive zigzag reordering is investigated, and experimental results are presented. A versatile algorithm, that generates zigzag scan paths for sub-blocks of any dimensions using a binary decision tree, is developed. An implementation of the algorithm based on programmable logic devices (PLDs) is described demonstrating the feasibility of hardware implementations. Coding of the sub-block dimensions, that need to be retained in order to reconstruct a sub-block during decoding, based on the scan-path length is developed. Lossy conversion of blocks of coefficients is also considered, and experimental results are presented. A two-layer feedforward artificial neural network trained using an error-backpropagation algorithm, that determines the sub-block dimensions, is described. Isolated nonzero coefficients of small significance are discarded in some blocks, and therefore smaller sub-blocks are generated
    • …
    corecore