48,119 research outputs found

    Auditing scholarly journals published in Malaysia and assessing their visibility

    Get PDF
    The problem with the identification of Malaysian scholarly journals lies in the lack of a current and complete listing of journals published in Malaysia. As a result, librarians are deprived of a tool that can be used for journal selection and identification of gaps in their serials collection. This study describes the audit carried out on scholarly journals, with the objectives (a) to trace and characterized scholarly journal titles published in Malaysia, and (b) to determine their visibility in international and national indexing databases. A total of 464 titles were traced and their yearly trends, publisher and publishing characteristics, bibliometrics and indexation in national, international and subject-based indexes were described

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjloglogn+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlogklogn)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    High-Performance Reachability Query Processing under Index Size Restrictions

    Full text link
    In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.Comment: 30 page

    Universal Indexes for Highly Repetitive Document Collections

    Get PDF
    Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We introduce new techniques for compressing inverted indexes that exploit this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar compression of the differential inverted lists, instead of the usual practice of gap-encoding them. We show that, in this highly repetitive setting, our compression methods significantly reduce the space obtained with classical techniques, at the price of moderate slowdowns. Moreover, our best methods are universal, that is, they do not need to know the versioning structure of the collection, nor that a clear versioning structure even exists. We also introduce compressed self-indexes in the comparison. These are designed for general strings (not only natural language texts) and represent the text collection plus the index structure (not an inverted index) in integrated form. We show that these techniques can compress much further, using a small fraction of the space required by our new inverted indexes. Yet, they are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    Improved Orientation Sampling for Indexing Diffraction Patterns of Polycrystalline Materials

    Get PDF
    Orientation mapping is a widely used technique for revealing the microstructure of a polycrystalline sample. The crystalline orientation at each point in the sample is determined by analysis of the diffraction pattern, a process known as pattern indexing. A recent development in pattern indexing is the use of a brute-force approach, whereby diffraction patterns are simulated for a large number of crystalline orientations, and compared against the experimentally observed diffraction pattern in order to determine the most likely orientation. Whilst this method can robust identify orientations in the presence of noise, it has very high computational requirements. In this article, the computational burden is reduced by developing a method for nearly-optimal sampling of orientations. By using the quaternion representation of orientations, it is shown that the optimal sampling problem is equivalent to that of optimally distributing points on a four-dimensional sphere. In doing so, the number of orientation samples needed to achieve a indexing desired accuracy is significantly reduced. Orientation sets at a range of sizes are generated in this way for all Laue groups, and are made available online for easy use.Comment: 11 pages, 7 figure
    corecore