11,648 research outputs found
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
Resource sharing and hybrid libraries: the MALIBU Project
The British electronic libraries programme has funded four projects to develop thinking on so-called hybrid libraries and one of these is MALIBU (Modernising Academic Libraries in British Universities), based at King's College London. Working with the libraries of Oxford and Southampton Universities it will take a very rich set of humanities resources from archives and incunabula to digital products and networked resources and create a seamless single access point to all the available resources. It will also explore how strategies can be developed to make resources available locally as operational services rather than unreliably over the Internet. It does not set out to create its own new tools but rather to find ways of integrating tools already in existence or being developed
Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation
Despite the importance of sparse matrices in numerous fields of science,
software implementations remain difficult to use for non-expert users,
generally requiring the understanding of underlying details of the chosen
sparse matrix storage format. In addition, to achieve good performance, several
formats may need to be used in one program, requiring explicit selection and
conversion between the formats. This can be both tedious and error-prone,
especially for non-expert users. Motivated by these issues, we present a
user-friendly and open-source sparse matrix class for the C++ language, with a
high-level application programming interface deliberately similar to the widely
used MATLAB language. This facilitates prototyping directly in C++ and aids the
conversion of research code into production environments. The class internally
uses two main approaches to achieve efficient execution: (i) a hybrid storage
framework, which automatically and seamlessly switches between three underlying
storage formats (compressed sparse column, Red-Black tree, coordinate list)
depending on which format is best suited and/or available for specific
operations, and (ii) a template-based meta-programming framework to
automatically detect and optimise execution of common expression patterns.
Empirical evaluations on large sparse matrices with various densities of
non-zero elements demonstrate the advantages of the hybrid storage framework
and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper
arXiv:1805.0338
Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation
Despite the importance of sparse matrices in numerous fields of science,
software implementations remain difficult to use for non-expert users,
generally requiring the understanding of underlying details of the chosen
sparse matrix storage format. In addition, to achieve good performance, several
formats may need to be used in one program, requiring explicit selection and
conversion between the formats. This can be both tedious and error-prone,
especially for non-expert users. Motivated by these issues, we present a
user-friendly and open-source sparse matrix class for the C++ language, with a
high-level application programming interface deliberately similar to the widely
used MATLAB language. This facilitates prototyping directly in C++ and aids the
conversion of research code into production environments. The class internally
uses two main approaches to achieve efficient execution: (i) a hybrid storage
framework, which automatically and seamlessly switches between three underlying
storage formats (compressed sparse column, Red-Black tree, coordinate list)
depending on which format is best suited and/or available for specific
operations, and (ii) a template-based meta-programming framework to
automatically detect and optimise execution of common expression patterns.
Empirical evaluations on large sparse matrices with various densities of
non-zero elements demonstrate the advantages of the hybrid storage framework
and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper
arXiv:1805.0338
Machine Learning of User Profiles: Representational Issues
As more information becomes available electronically, tools for finding
information of interest to users becomes increasingly important. The goal of
the research described here is to build a system for generating comprehensible
user profiles that accurately capture user interest with minimum user
interaction. The research described here focuses on the importance of a
suitable generalization hierarchy and representation for learning profiles
which are predictively accurate and comprehensible. In our experiments we
evaluated both traditional features based on weighted term vectors as well as
subject features corresponding to categories which could be drawn from a
thesaurus. Our experiments, conducted in the context of a content-based
profiling system for on-line newspapers on the World Wide Web (the IDD News
Browser), demonstrate the importance of a generalization hierarchy and the
promise of combining natural language processing techniques with machine
learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
- …