26,386 research outputs found
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Psychometrics in Practice at RCEC
A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud
All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment
The Best Trail Algorithm for Assisted Navigation of Web Sites
We present an algorithm called the Best Trail Algorithm, which helps solve
the hypertext navigation problem by automating the construction of memex-like
trails through the corpus. The algorithm performs a probabilistic best-first
expansion of a set of navigation trees to find relevant and compact trails. We
describe the implementation of the algorithm, scoring methods for trails,
filtering algorithms and a new metric called \emph{potential gain} which
measures the potential of a page for future navigation opportunities.Comment: 11 pages, 11 figure
- âŠ