Search CORE

48,119 research outputs found

Auditing scholarly journals published in Malaysia and assessing their visibility

Author: Edzan N. N.
Koh A. P.
Sanni S. A.
Zainab A. N.
Publication venue
Publication date: 01/01/2012
Field of study

The problem with the identification of Malaysian scholarly journals lies in the lack of a current and complete listing of journals published in Malaysia. As a result, librarians are deprived of a tool that can be used for journal selection and identification of gaps in their serials collection. This study describes the audit carried out on scholarly journals, with the objectives (a) to trace and characterized scholarly journal titles published in Malaysia, and (b) to determine their visibility in international and national indexing databases. A total of 464 titles were traced and their yearly trends, publisher and publishing characteristics, bibliometrics and indexation in national, international and subject-based indexes were described

arXiv.org e-Print Archive

UM Digital Repository

String Indexing for Patterns with Wildcards

Author: A. Tam
B. Chazelle
D. Harel
D. Tsur
G. Chen
G. Landau
G. Landau
G. Navarro
H.L. Chan
K. Hofmann
L.P. Coelho
M. Lewenstein
M. Maas
M.L. Fredman
P. Bille
P. Bille
P. Clifford
T.-W. Lam
Z. Galil
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of indexing a string

t

of length

n

to report the occurrences of a query pattern

p

containing

m

characters and

j

wildcards. Let

occ

be the number of occurrences of

p

t

, and

\sigma

the size of the alphabet. We obtain the following results. - A linear space index with query time

O(m+\sigma^j \log \log n + occ)

. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time

\Theta(jn)

in the worst case. - An index with query time

O(m+j+occ)

using space

O(\sigma^{k^2} n \log^k \log n)

, where

k

is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

High-Performance Reachability Query Processing under Index Size Restrictions

Author: Anand Avishek
Bedathur Srikanta
Seufert Stephan
Weikum Gerhard
Publication venue
Publication date: 01/01/2012
Field of study

In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.Comment: 30 page

arXiv.org e-Print Archive

MPG.PuRe

Universal Indexes for Highly Repetitive Document Collections

Author: Claude Francisco
Fariña Antonio
Martínez-Prieto Miguel A.
Navarro Gonzalo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We introduce new techniques for compressing inverted indexes that exploit this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar compression of the differential inverted lists, instead of the usual practice of gap-encoding them. We show that, in this highly repetitive setting, our compression methods significantly reduce the space obtained with classical techniques, at the price of moderate slowdowns. Moreover, our best methods are universal, that is, they do not need to know the versioning structure of the collection, nor that a clear versioning structure even exists. We also introduce compressed self-indexes in the comparison. These are designed for general strings (not only natural language texts) and represent the text collection plus the index structure (not an inverted index) in integrated form. We show that these techniques can compress much further, using a small fraction of the space required by our new inverted indexes. Yet, they are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Académico de la Universidad de Chile

Improved Orientation Sampling for Indexing Diffraction Patterns of Polycrystalline Materials

Author: Larsen Peter Mahler
Schmidt Søren
Publication venue
Publication date: 01/01/2017
Field of study

Orientation mapping is a widely used technique for revealing the microstructure of a polycrystalline sample. The crystalline orientation at each point in the sample is determined by analysis of the diffraction pattern, a process known as pattern indexing. A recent development in pattern indexing is the use of a brute-force approach, whereby diffraction patterns are simulated for a large number of crystalline orientations, and compared against the experimentally observed diffraction pattern in order to determine the most likely orientation. Whilst this method can robust identify orientations in the presence of noise, it has very high computational requirements. In this article, the computational burden is reduced by developing a method for nearly-optimal sampling of orientations. By using the quaternion representation of orientations, it is shown that the optimal sampling problem is equivalent to that of optimally distributing points on a four-dimensional sphere. In doing so, the number of orientation samples needed to achieve a indexing desired accuracy is significantly reduced. Orientation sets at a range of sizes are generated in this way for all Laue groups, and are made available online for easy use.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Online Research Database In Technology