30,923 research outputs found
An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise
Collaborative filtering based recommender systems have proven to be extremely
successful in settings where user preference data on items is abundant.
However, collaborative filtering algorithms are hindered by their weakness
against the item cold-start problem and general lack of interpretability.
Ontology-based recommender systems exploit hierarchical organizations of users
and items to enhance browsing, recommendation, and profile construction. While
ontology-based approaches address the shortcomings of their collaborative
filtering counterparts, ontological organizations of items can be difficult to
obtain for items that mostly belong to the same category (e.g., television
series episodes). In this paper, we present an ontology-based recommender
system that integrates the knowledge represented in a large ontology of
literary themes to produce fiction content recommendations. The main novelty of
this work is an ontology-based method for computing similarities between items
and its integration with the classical Item-KNN (K-nearest neighbors)
algorithm. As a study case, we evaluated the proposed method against other
approaches by performing the classical rating prediction task on a collection
of Star Trek television series episodes in an item cold-start scenario. This
transverse evaluation provides insights into the utility of different
information resources and methods for the initial stages of recommender system
development. We found our proposed method to be a convenient alternative to
collaborative filtering approaches for collections of mostly similar items,
particularly when other content-based approaches are not applicable or
otherwise unavailable. Aside from the new methods, this paper contributes a
testbed for future research and an online framework to collaboratively extend
the ontology of literary themes to cover other narrative content.Comment: 25 pages, 6 figures, 5 tables, minor revision
Toward the automation of business process ontology generation
Semantic Business Process Management (SBPM) utilises semantic technologies (e.g., ontology) to model and query process representations. There are times in which such models must be reconstructed from existing textual documentation. In this scenario the automated generation of ontological models would be preferable, however current methods and technology are still not capable of automatically generating accurate semantic process models from textual descriptions. This research attempts to automate the process as much as possible by proposing a method that drives the transformation through the joint use of a foundational ontology and lexico-semantic analysis. The method is presented, demonstrated and evaluated. The original dataset represents 150 business activities related to the procurement processes of a case study company. As the evaluation shows, the proposed method can accurately map the linguistic patterns of the process descriptions to semantic patterns of the foundational ontology to a high level of accuracy, however further research is required in order to reduce the level of human intervention, expand the method so as to recognise further patterns of the foundational ontology and develop a tool to assist the business process modeller in the semi-automated generation of process models
An Experimental Digital Library Platform - A Demonstrator Prototype for the DigLib Project at SICS
Within the framework of the Digital Library project at SICS, this thesis describes the implementation of a demonstrator prototype of a digital library (DigLib); an experimental platform integrating several functions in one common interface. It includes descriptions of the structure and formats of the digital library collection, the tailoring of the search engine Dienst, the construction of a keyword extraction tool, and the design and development of the interface. The platform was realised through sicsDAIS, an agent interaction and presentation system, and is to be used for testing and evaluating various tools for information seeking. The platform supports various user interaction strategies by providing: search in bibliographic records (Dienst); an index of keywords (the Keyword Extraction Function (KEF)); and browsing through the hierarchical structure of the collection. KEF was developed for this thesis work, and extracts and presents keywords from Swedish documents. Although based on a comparatively simple algorithm, KEF contributes by supplying a long-felt want in the area of Information Retrieval. Evaluations of the tasks and the interface still remain to be done, but the digital library is very much up and running. By implementing the platform through sicsDAIS, DigLib can deploy additional tools and search engines without interfering with already running modules. If wanted, agents providing other services than SICS can supply, can be plugged in
Connotation Frames: A Data-Driven Investigation
Through a particular choice of a predicate (e.g., "x violated y"), a writer
can subtly connote a range of implied sentiments and presupposed facts about
the entities x and y: (1) writer's perspective: projecting x as an
"antagonist"and y as a "victim", (2) entities' perspective: y probably dislikes
x, (3) effect: something bad happened to y, (4) value: y is something valuable,
and (5) mental state: y is distressed by the event. We introduce connotation
frames as a representation formalism to organize these rich dimensions of
connotation using typed relations. First, we investigate the feasibility of
obtaining connotative labels through crowdsourcing experiments. We then present
models for predicting the connotation frames of verb predicates based on their
distributional word representations and the interplay between different types
of connotative relations. Empirical results confirm that connotation frames can
be induced from various data sources that reflect how people use language and
give rise to the connotative meanings. We conclude with analytical results that
show the potential use of connotation frames for analyzing subtle biases in
online news media.Comment: 11 pages, published in Proceedings of ACL 201
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
Verb Physics: Relative Physical Knowledge of Actions and Objects
Learning commonsense knowledge from natural language text is nontrivial due
to reporting bias: people rarely state the obvious, e.g., "My house is bigger
than me." However, while rarely stated explicitly, this trivial everyday
knowledge does influence the way people talk about the world, which provides
indirect clues to reason about the world. For example, a statement like, "Tyler
entered his house" implies that his house is bigger than Tyler.
In this paper, we present an approach to infer relative physical knowledge of
actions and objects along five dimensions (e.g., size, weight, and strength)
from unstructured natural language text. We frame knowledge acquisition as
joint inference over two closely related problems: learning (1) relative
physical knowledge of object pairs and (2) physical implications of actions
when applied to those object pairs. Empirical results demonstrate that it is
possible to extract knowledge of actions and objects from language and that
joint inference over different types of knowledge improves performance.Comment: 11 pages, published in Proceedings of ACL 201
Inference and Evaluation of the Multinomial Mixture Model for Text Clustering
In this article, we investigate the use of a probabilistic model for
unsupervised clustering in text collections. Unsupervised clustering has become
a basic module for many intelligent text processing applications, such as
information retrieval, text classification or information extraction. The model
considered in this contribution consists of a mixture of multinomial
distributions over the word counts, each component corresponding to a different
theme. We present and contrast various estimation procedures, which apply both
in supervised and unsupervised contexts. In supervised learning, this work
suggests a criterion for evaluating the posterior odds of new documents which
is more statistically sound than the "naive Bayes" approach. In an unsupervised
context, we propose measures to set up a systematic evaluation framework and
start with examining the Expectation-Maximization (EM) algorithm as the basic
tool for inference. We discuss the importance of initialization and the
influence of other features such as the smoothing strategy or the size of the
vocabulary, thereby illustrating the difficulties incurred by the high
dimensionality of the parameter space. We also propose a heuristic algorithm
based on iterative EM with vocabulary reduction to solve this problem. Using
the fact that the latent variables can be analytically integrated out, we
finally show that Gibbs sampling algorithm is tractable and compares favorably
to the basic expectation maximization approach
- …