4,163 research outputs found
Automated user modeling for personalized digital libraries
Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to
improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in
an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
Efficient Iterative Processing in the SciDB Parallel Array Engine
Many scientific data-intensive applications perform iterative computations on
array data. There exist multiple engines specialized for array processing.
These engines efficiently support various types of operations, but none
includes native support for iterative processing. In this paper, we develop a
model for iterative array computations and a series of optimizations. We
evaluate the benefits of an optimized, native support for iterative array
processing on the SciDB engine and real workloads from the astronomy domain
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
A Relation-Based Page Rank Algorithm for Semantic Web Search Engines
With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of Semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with Semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definitio
REX: Recursive, Delta-Based Data-Centric Computation
In today's Web and social network environments, query workloads include ad
hoc and OLAP queries, as well as iterative algorithms that analyze data
relationships (e.g., link analysis, clustering, learning). Modern DBMSs support
ad hoc and OLAP queries, but most are not robust enough to scale to large
clusters. Conversely, "cloud" platforms like MapReduce execute chains of batch
tasks across clusters in a fault tolerant way, but have too much overhead to
support ad hoc queries.
Moreover, both classes of platform incur significant overhead in executing
iterative data analysis algorithms. Most such iterative algorithms repeatedly
refine portions of their answers, until some convergence criterion is reached.
However, general cloud platforms typically must reprocess all data in each
step. DBMSs that support recursive SQL are more efficient in that they
propagate only the changes in each step -- but they still accumulate each
iteration's state, even if it is no longer useful. User-defined functions are
also typically harder to write for DBMSs than for cloud platforms.
We seek to unify the strengths of both styles of platforms, with a focus on
supporting iterative computations in which changes, in the form of deltas, are
propagated from iteration to iteration, and state is efficiently updated in an
extensible way. We present a programming model oriented around deltas, describe
how we execute and optimize such programs in our REX runtime system, and
validate that our platform also handles failures gracefully. We experimentally
validate our techniques, and show speedups over the competing methods ranging
from 2.5 to nearly 100 times.Comment: VLDB201
- …