1,299 research outputs found
Boyer-Moore strategy to efficient approximate string matching
International audienceWe propose a simple but e cient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size w, that is, m(âlog2(k+1)â+1 )â€w. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2+(k+4)/(m-k))
An approximate search engine for structure
As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art.
In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc.
Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed.
The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures
A Computational Model for the Acquisition and Use of Phonological Knowledge
Does knowledge of language consist of symbolic rules? How do children learn and use their linguistic knowledge? To elucidate these questions, we present a computational model that acquires phonological knowledge from a corpus of common English nouns and verbs. In our model the phonological knowledge is encapsulated as boolean constraints operating on classical linguistic representations of speech sounds in term of distinctive features. The learning algorithm compiles a corpus of words into increasingly sophisticated constraints. The algorithm is incremental, greedy, and fast. It yields one-shot learning of phonological constraints from a few examples. Our system exhibits behavior similar to that of young children learning phonological knowledge. As a bonus the constraints can be interpreted as classical linguistic rules. The computational model can be implemented by a surprisingly simple hardware mechanism. Our mechanism also sheds light on a fundamental AI question: How are signals related to symbols
Approximations by OBDDs and the Variable Ordering Problem
Ordered binary decision diagrams (OBDDs) and their variants are motivated by the need to represent Boolean functions in applications. Research concerning these applications leads also to problems and results interesting from theoretical point of view. In this paper, methods from communication complexity and information theory are combined to prove that the direct storage access function and the inner product function have the following property. They have linear pi-OBDD size for some variable ordering pi and, for most variable orderings pi0 , all functions which approximate them on considerably more than half of the inputs, need exponential pi0-OBDD size. These results have implications for the use of OBDDs in genetic programming
WAQS : a web-based approximate query system
The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval.
In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language.
Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation
Synthesis and Optimization of Reversible Circuits - A Survey
Reversible logic circuits have been historically motivated by theoretical
research in low-power electronics as well as practical improvement of
bit-manipulation transforms in cryptography and computer graphics. Recently,
reversible circuits have attracted interest as components of quantum
algorithms, as well as in photonic and nano-computing technologies where some
switching devices offer no signal gain. Research in generating reversible logic
distinguishes between circuit synthesis, post-synthesis optimization, and
technology mapping. In this survey, we review algorithmic paradigms ---
search-based, cycle-based, transformation-based, and BDD-based --- as well as
specific algorithms for reversible synthesis, both exact and heuristic. We
conclude the survey by outlining key open challenges in synthesis of reversible
and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table
Efficient and secure generalized pattern matching via fast fourier transform
International audienceWe present simple protocols for secure two-party computation of generalized pattern matching in the presence of malicious parties. The problem is to determine all positions in a text T where a pattern P occurs (or matches with few mismatches) allowing possibly both T and P to contain single character wildcards. We propose constant-round protocols that exhibit linear communication and quasilinear computational costs with simulation-based security. Our constructions rely on a well-known technique for pattern matching proposed by Fischer and Paterson in 1974 and based on the Fast Fourier Transform. The security of the new schemes is reduced to the semantic security of the ElGamal encryption scheme
DISCOVERING INTERESTING PATTERNS FOR INVESTMENT DECISION MAKING WITH GLOWER C - A GENETIC LEARNER OVERLAID WITH ENTROPY REDUCTION
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be
weak or non-existent, which makes problem formulation open-ended by forcing us to consider a large
number of independent variables and thereby increasing the dimensionality of the search space. Second, the
weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search
space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well
performing indicators, a key is to find the hidden interactions among variables that perform well in
combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by
many standard rule algorithms will miss. In this paper, we describe and evaluate several variations of a new
genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated
by financial prediction problems, but incorporates successful ideas from tree induction and rule learning.
We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard
financial prediction problem (S&P500 stock returns), using the results to identify and use one of the better
variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting
positive and negative earnings surprises), and experiment withGLOWER, contrasting it with tree- and rule-induction
approaches. Our results are encouraging, showing that GLOWER has the ability to uncover
effective patterns for difficult problems that have weak structure and significant nonlinearities.Information Systems Working Papers Serie
- âŠ