45,841 research outputs found
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching
We present parallel algorithms for exact and approximate pattern matching
with suffix arrays, using a CREW-PRAM with processors. Given a static text
of length , we first show how to compute the suffix array interval of a
given pattern of length in
time for . For approximate pattern matching with differences or
mismatches, we show how to compute all occurrences of a given pattern in
time, where is the size of the alphabet
and . The workhorse of our algorithms is a data structure
for merging suffix array intervals quickly: Given the suffix array intervals
for two patterns and , we present a data structure for computing the
interval of in sequential time, or in
parallel time. All our data structures are of size bits (in addition to
the suffix array)
Instrumenting self-modifying code
Adding small code snippets at key points to existing code fragments is called
instrumentation. It is an established technique to debug certain otherwise hard
to solve faults, such as memory management issues and data races. Dynamic
instrumentation can already be used to analyse code which is loaded or even
generated at run time.With the advent of environments such as the Java Virtual
Machine with optimizing Just-In-Time compilers, a new obstacle arises:
self-modifying code. In order to instrument this kind of code correctly, one
must be able to detect modifications and adapt the instrumentation code
accordingly, preferably without incurring a high penalty speedwise. In this
paper we propose an innovative technique that uses the hardware page protection
mechanism of modern processors to detect such modifications. We also show how
an instrumentor can adapt the instrumented version depending on the kind of
modificiations as well as an experimental evaluation of said techniques.Comment: In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth
International Workshop on Automated Debugging (AADEBUG 2003), September 2003,
Ghent. cs.SE/030902
Broadword Implementation of Parenthesis Queries
We continue the line of research started in "Broadword Implementation of
Rank/Select Queries" proposing broadword (a.k.a. SWAR, "SIMD Within A
Register") algorithms for finding matching closed parentheses and the k-th far
closed parenthesis. Our algorithms work in time O(log w) on a word of w bits,
and contain no branch and no test instruction. On 64-bit (and wider)
architectures, these algorithms make it possible to avoid costly tabulations,
while providing a very significant speedup with respect to for-loop
implementations
Simple, compact and robust approximate string dictionary
This paper is concerned with practical implementations of approximate string
dictionaries that allow edit errors. In this problem, we have as input a
dictionary of strings of total length over an alphabet of size
. Given a bound and a pattern of length , a query has to
return all the strings of the dictionary which are at edit distance at most
from , where the edit distance between two strings and is defined as
the minimum-cost sequence of edit operations that transform into . The
cost of a sequence of operations is defined as the sum of the costs of the
operations involved in the sequence. In this paper, we assume that each of
these operations has unit cost and consider only three operations: deletion of
one character, insertion of one character and substitution of a character by
another. We present a practical implementation of the data structure we
recently proposed and which works only for one error. We extend the scheme to
. Our implementation has many desirable properties: it has a very
fast and space-efficient building algorithm. The dictionary data structure is
compact and has fast and robust query time. Finally our data structure is
simple to implement as it only uses basic techniques from the literature,
mainly hashing (linear probing and hash signatures) and succinct data
structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures
- …