48,119 research outputs found
Auditing scholarly journals published in Malaysia and assessing their visibility
The problem with the identification of Malaysian scholarly journals lies in
the lack of a current and complete listing of journals published in Malaysia.
As a result, librarians are deprived of a tool that can be used for journal
selection and identification of gaps in their serials collection. This study
describes the audit carried out on scholarly journals, with the objectives (a)
to trace and characterized scholarly journal titles published in Malaysia, and
(b) to determine their visibility in international and national indexing
databases. A total of 464 titles were traced and their yearly trends, publisher
and publishing characteristics, bibliometrics and indexation in national,
international and subject-based indexes were described
String Indexing for Patterns with Wildcards
We consider the problem of indexing a string of length to report the
occurrences of a query pattern containing characters and wildcards.
Let be the number of occurrences of in , and the size of
the alphabet. We obtain the following results.
- A linear space index with query time .
This significantly improves the previously best known linear space index by Lam
et al. [ISAAC 2007], which requires query time in the worst case.
- An index with query time using space , where is the maximum number of wildcards allowed in the pattern.
This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].
We also show that these indexes can be generalized to allow variable length
gaps in the pattern. Our results are obtained using a novel combination of
well-known and new techniques, which could be of independent interest
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
Universal Indexes for Highly Repetitive Document Collections
Indexing highly repetitive collections has become a relevant problem with the
emergence of large repositories of versioned documents, among other
applications. These collections may reach huge sizes, but are formed mostly of
documents that are near-copies of others. Traditional techniques for indexing
these collections fail to properly exploit their regularities in order to
reduce space.
We introduce new techniques for compressing inverted indexes that exploit
this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar
compression of the differential inverted lists, instead of the usual practice
of gap-encoding them. We show that, in this highly repetitive setting, our
compression methods significantly reduce the space obtained with classical
techniques, at the price of moderate slowdowns. Moreover, our best methods are
universal, that is, they do not need to know the versioning structure of the
collection, nor that a clear versioning structure even exists.
We also introduce compressed self-indexes in the comparison. These are
designed for general strings (not only natural language texts) and represent
the text collection plus the index structure (not an inverted index) in
integrated form. We show that these techniques can compress much further, using
a small fraction of the space required by our new inverted indexes. Yet, they
are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
Improved Orientation Sampling for Indexing Diffraction Patterns of Polycrystalline Materials
Orientation mapping is a widely used technique for revealing the
microstructure of a polycrystalline sample. The crystalline orientation at each
point in the sample is determined by analysis of the diffraction pattern, a
process known as pattern indexing. A recent development in pattern indexing is
the use of a brute-force approach, whereby diffraction patterns are simulated
for a large number of crystalline orientations, and compared against the
experimentally observed diffraction pattern in order to determine the most
likely orientation. Whilst this method can robust identify orientations in the
presence of noise, it has very high computational requirements. In this
article, the computational burden is reduced by developing a method for
nearly-optimal sampling of orientations. By using the quaternion representation
of orientations, it is shown that the optimal sampling problem is equivalent to
that of optimally distributing points on a four-dimensional sphere. In doing
so, the number of orientation samples needed to achieve a indexing desired
accuracy is significantly reduced. Orientation sets at a range of sizes are
generated in this way for all Laue groups, and are made available online for
easy use.Comment: 11 pages, 7 figure
- …