11,135 research outputs found
Average-Case Optimal Approximate Circular String Matching
Approximate string matching is the problem of finding all factors of a text t
of length n that are at a distance at most k from a pattern x of length m.
Approximate circular string matching is the problem of finding all factors of t
that are at a distance at most k from x or from any of its rotations. In this
article, we present a new algorithm for approximate circular string matching
under the edit distance model with optimal average-case search time O(n(k + log
m)/m). Optimal average-case search time can also be achieved by the algorithms
for multiple approximate string matching (Fredriksson and Navarro, 2004) using
x and its rotations as the set of multiple patterns. Here we reduce the
preprocessing time and space requirements compared to that approach
NASA automatic subject analysis technique for extracting retrievable multi-terms (NASA TERM) system
Current methods for information processing and retrieval used at the NASA Scientific and Technical Information Facility are reviewed. A more cost effective computer aided indexing system is proposed which automatically generates print terms (phrases) from the natural text. Satisfactory print terms can be generated in a primarily automatic manner to produce a thesaurus (NASA TERMS) which extends all the mappings presently applied by indexers, specifies the worth of each posting term in the thesaurus, and indicates the areas of use of the thesaurus entry phrase. These print terms enable the computer to determine which of several terms in a hierarchy is desirable and to differentiate ambiguous terms. Steps in the NASA TERMS algorithm are discussed and the processing of surrogate entry phrases is demonstrated using four previously manually indexed STAR abstracts for comparison. The simulation shows phrase isolation, text phrase reduction, NASA terms selection, and RECON display
Data curation standards and social science occupational information resources
Occupational information resources - data about the characteristics of different occupational positions - are widely used in the social sciences, across a range of disciplines and international contexts. They are available in many formats, most often constituting small electronic files that are made freely downloadable from academic web-pages. However there are several challenges associated with how occupational information resources are distributed to, and exploited by, social researchers. In this paper we describe features of occupational information resources, and indicate the role digital curation can play in exploiting them. We report upon the strategies used in the GEODE research project (Grid Enabled Occupational Data Environment, http://www.geode.stir.ac.uk). This project attempts to develop long-term standards for the distribution of occupational information resources, by providing a standardized framework-based electronic depository for occupational information resources, and by providing a data indexing service, based on e-Science middleware, which collates occupational information resources and makes them readily accessible to non-specialist social scientists
Porqpine: a peer-to-peer search engine
In this paper, we present a fully distributed and collaborative search
engine for web pages: Porqpine. This system uses a novel query-based model
and collaborative filtering techniques in order to obtain user-customized
results. All knowledge about users and profiles is stored in each user
node?s application. Overall the system is a multi-agent system that runs on
the computers of the user community. The nodes interact in a peer-to-peer
fashion in order to create a real distributed search engine where
information is completely distributed among all the nodes in the network.
Moreover, the system preserves the privacy of user queries and results by
maintaining the anonymity of the queries? consumers and results? producers.
The knowledge required by the system to work is implicitly caught through
the monitoring of users actions, not only within the system?s interface but
also within one of the most popular web browsers. Thus, users are not
required to explicitly feed knowledge about their interests into the system
since this process is done automatically. In this manner, users obtain the
benefits of a personalized search engine just by installing the application
on their computer. Porqpine does not intend to shun completely conventional
centralized search engines but to complement them by issuing more accurate
and personalized results.Postprint (published version
- …