7,036 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
String Matching Problems with Parallel Approaches An Evaluation for the Most Recent Studies
In recent years string matching plays a functional role in many application like information retrieval, gene analysis, pattern recognition, linguistics, bioinformatics etc. For understanding the functional requirements of string matching algorithms, we surveyed the real time parallel string matching patterns to handle the current trends. Primarily, in this paper, we focus on present developments of parallel string matching, and the central ideas of the algorithms and their complexities. We present the performance of the different algorithms and their effectiveness. Finally this analysis helps the researchers to develop the better techniques
Extending functional databases for use in text-intensive applications
This thesis continues research exploring the benefits of using functional
databases based around the functional data model for advanced database
applications-particularly those supporting investigative systems. This is a
growing generic application domain covering areas such as criminal and military
intelligence, which are characterised by significant data complexity, large data
sets and the need for high performance, interactive use. An experimental
functional database language was developed to provide the requisite semantic
richness. However, heavy use in a practical context has shown that language
extensions and implementation improvements are required-especially in the
crucial areas of string matching and graph traversal. In addition, an
implementation on multiprocessor, parallel architectures is essential to meet the
performance needs arising from existing and projected database sizes in the
chosen application area. [Continues.
Exact string matching algorithms : survey, issues, and future research directions
String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches (such as exact string matching and approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This paper presents a survey on single-pattern exact string matching algorithms. The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms. © 2013 IEEE
QuickCSG: Fast Arbitrary Boolean Combinations of N Solids
QuickCSG computes the result for general N-polyhedron boolean expressions
without an intermediate tree of solids. We propose a vertex-centric view of the
problem, which simplifies the identification of final geometric contributions,
and facilitates its spatial decomposition. The problem is then cast in a single
KD-tree exploration, geared toward the result by early pruning of any region of
space not contributing to the final surface. We assume strong regularity
properties on the input meshes and that they are in general position. This
simplifying assumption, in combination with our vertex-centric approach,
improves the speed of the approach. Complemented with a task-stealing
parallelization, the algorithm achieves breakthrough performance, one to two
orders of magnitude speedups with respect to state-of-the-art CPU algorithms,
on boolean operations over two to dozens of polyhedra. The algorithm also
outperforms GPU implementations with approximate discretizations, while
producing an output without redundant facets. Despite the restrictive
assumptions on the input, we show the usefulness of QuickCSG for applications
with large CSG problems and strong temporal constraints, e.g. modeling for 3D
printers, reconstruction from visual hulls and collision detection
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
DC: a highly efficient and flexible exact pattern-matching algorithm
ware of the need for faster and flexible searching algorithms in fields such as web searching or bioinformatics, we propose DC - a high-performance algorithm for exact pattern matching. Emphasizing the analysis of pattern peculiarities in the pre-processing phase, the algorithm en- compasses a novel search logic based on the examination of multiple alignments within a larger window, selectively tested after a powerful heuristic called compatibility rule is verified. The new algorithm’s performance is, on average, above its best-rated competitors when testing different data types and using a complete suite of pattern extensions and compositions. The flexibility is remarkable and the efficiency is more relevant in quaternary or greater alphabets. Keywords: exact pattern-match, searching algorithms
- …