23,744 research outputs found
Ranking Archived Documents for Structured Queries on Semantic Layers
Archived collections of documents (like newspaper and web archives) serve as
important information sources in a variety of disciplines, including Digital
Humanities, Historical Science, and Journalism. However, the absence of
efficient and meaningful exploration methods still remains a major hurdle in
the way of turning them into usable sources of information. A semantic layer is
an RDF graph that describes metadata and semantic information about a
collection of archived documents, which in turn can be queried through a
semantic query language (SPARQL). This allows running advanced queries by
combining metadata of the documents (like publication date) and content-based
semantic information (like entities mentioned in the documents). However, the
results returned by such structured queries can be numerous and moreover they
all equally match the query. In this paper, we deal with this problem and
formalize the task of "ranking archived documents for structured queries on
semantic layers". Then, we propose two ranking models for the problem at hand
which jointly consider: i) the relativeness of documents to entities, ii) the
timeliness of documents, and iii) the temporal relations among the entities.
The experimental results on a new evaluation dataset show the effectiveness of
the proposed models and allow us to understand their limitation
Exploiting Social Annotation for Automatic Resource Discovery
Information integration applications, such as mediators or mashups, that
require access to information resources currently rely on users manually
discovering and integrating them in the application. Manual resource discovery
is a slow process, requiring the user to sift through results obtained via
keyword-based search. Although search methods have advanced to include evidence
from document contents, its metadata and the contents and link structure of the
referring pages, they still do not adequately cover information sources --
often called ``the hidden Web''-- that dynamically generate documents in
response to a query. The recently popular social bookmarking sites, which allow
users to annotate and share metadata about various information sources, provide
rich evidence for resource discovery. In this paper, we describe a
probabilistic model of the user annotation process in a social bookmarking
system del.icio.us. We then use the model to automatically find resources
relevant to a particular information domain. Our experimental results on data
obtained from \emph{del.icio.us} show this approach as a promising method for
helping automate the resource discovery task.Comment: 6 pages, submitted to AAAI07 workshop on Information Integration on
the We
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
- …