20,511 research outputs found
Target Type Identification for Entity-Bearing Queries
Identifying the target types of entity-bearing queries can help improve
retrieval performance as well as the overall search experience. In this work,
we address the problem of automatically detecting the target types of a query
with respect to a type taxonomy. We propose a supervised learning approach with
a rich variety of features. Using a purpose-built test collection, we show that
our approach outperforms existing methods by a remarkable margin. This is an
extended version of the article published with the same title in the
Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page
Tools for producing formal specifications : a view of current architectures and future directions
During the last decade, one important contribution towards requirements engineering has been the advent of formal specification languages. They offer a well-defined notation that can improve consistency and avoid ambiguity in specifications.
However, the process of obtaining formal specifications that are consistent with the requirements is itself a difficult activity. Hence various researchers are developing systems that aid the transition from informal to formal specifications.
The kind of problems tackled and the contributions made by these proposed systems are very diverse. This paper brings these studies together to provide a vision for future architectures that aim to aid the transition from informal to formal specifications. The new architecture, which is based on the strengths of existing studies, tackles a
number of key issues in requirements engineering such as identifying ambiguities, incompleteness, and reusability.
The paper concludes with a discussion of the research problems that need to be addressed in order to realise the proposed architecture
Knowledge management support for enterprise distributed systems
Explosion of information and increasing demands on semantic processing web applications have software systems to their limits. To address the problem we propose a semantic based formal framework (ADP) that makes use of promising technologies to enable knowledge generation and retrieval. We argue that this approach is cost effective, as it reuses and builds on existing knowledge and structure. It is also a good starting point for creating an organisational memory and providing knowledge management functions
Cross-lingual document retrieval categorisation and navigation based on distributed services
The widespread use of the Internet across countries has increased the need for access to document collections
that are often written in languages different from a user’s native language. In this paper we describe Clarity, a
Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian.
Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation,
text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven
methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system
architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii)
the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an
example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only
such system that currently supports Baltic languages
Multiple Models for Recommending Temporal Aspects of Entities
Entity aspect recommendation is an emerging task in semantic search that
helps users discover serendipitous and prominent information with respect to an
entity, of which salience (e.g., popularity) is the most important factor in
previous work. However, entity aspects are temporally dynamic and often driven
by events happening over time. For such cases, aspect suggestion based solely
on salience features can give unsatisfactory results, for two reasons. First,
salience is often accumulated over a long time period and does not account for
recency. Second, many aspects related to an event entity are strongly
time-dependent. In this paper, we study the task of temporal aspect
recommendation for a given entity, which aims at recommending the most relevant
aspects and takes into account time in order to improve search experience. We
propose a novel event-centric ensemble ranking method that learns from multiple
time and type-dependent models and dynamically trades off salience and recency
characteristics. Through extensive experiments on real-world query logs, we
demonstrate that our method is robust and achieves better effectiveness than
competitive baselines.Comment: In proceedings of the 15th Extended Semantic Web Conference (ESWC
2018
A distributional and syntactic approach to fine-grained opinion mining
This thesis contributes to a larger social science research program of
analyzing the diffusion of IT innovations. We show how to
automatically discriminate portions of text dealing with opinions
about innovations by finding {source, target, opinion} triples in text.
In this context, we can discern a list of innovations as targets from
the domain itself. We can then use this list as an anchor for finding
the other two members of the triple at a ``fine-grained''
level---paragraph contexts or less.
We first demonstrate a vector space model for finding opinionated
contexts in which the innovation targets are mentioned. We can find
paragraph-level contexts by searching for an
``expresses-an-opinion-about'' relation between sources and targets
using a supervised model with an SVM that uses features derived from a
general-purpose subjectivity lexicon and a corpus indexing tool. We
show that our algorithm correctly filters the domain relevant subset
of subjectivity terms so that they are more highly valued.
We then turn to identifying the opinion. Typically, opinions in
opinion mining are taken to be positive or negative. We discuss a
crowd sourcing technique developed to create the seed data describing
human perception of opinion bearing language needed for our supervised
learning algorithm. Our user interface successfully limited the
meta-subjectivity inherent in the task (``What is an opinion?'') while
reliably retrieving relevant opinionated words using labour not expert
in the domain.
Finally, we developed a new data structure and modeling technique for
connecting targets with the correct within-sentence opinionated
language. Syntactic relatedness tries (SRTs) contain all paths from a
dependency graph of a sentence that connect a target expression to a
candidate opinionated word. We use factor graphs to model how far a
path through the SRT must be followed in order to connect the right
targets to the right words. It turns out that we can correctly label
significant portions of these tries with very rudimentary features
such as part-of-speech tags and dependency labels with minimal
processing. This technique uses the data from the crowdsourcing
technique we developed as training data.
We conclude by placing our work in the context of a larger sentiment
classification pipeline and by describing a model for learning from
the data structures produced by our work. This work contributes to
computational linguistics by proposing and verifying new data
gathering techniques and applying recent developments in machine
learning to inference over grammatical structures for highly
subjective purposes. It applies a suffix tree-based data structure to
model opinion in a specific domain by imposing a restriction on the
order in which the data is stored in the structure
- …