13,858 research outputs found
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
Constellation Queries over Big Data
A geometrical pattern is a set of points with all pairwise distances (or,
more generally, relative distances) specified. Finding matches to such patterns
has applications to spatial data in seismic, astronomical, and transportation
contexts. For example, a particularly interesting geometric pattern in
astronomy is the Einstein cross, which is an astronomical phenomenon in which a
single quasar is observed as four distinct sky objects (due to gravitational
lensing) when captured by earth telescopes. Finding such crosses, as well as
other geometric patterns, is a challenging problem as the potential number of
sets of elements that compose shapes is exponentially large in the size of the
dataset and the pattern. In this paper, we denote geometric patterns as
constellation queries and propose algorithms to find them in large data
applications. Our methods combine quadtrees, matrix multiplication, and
unindexed join processing to discover sets of points that match a geometric
pattern within some additive factor on the pairwise distances. Our distributed
experiments show that the choice of composition algorithm (matrix
multiplication or nested loops) depends on the freedom introduced in the query
geometry through the distance additive factor. Three clearly identified blocks
of threshold values guide the choice of the best composition algorithm.
Finally, solving the problem for relative distances requires a novel
continuous-to-discrete transformation. To the best of our knowledge this paper
is the first to investigate constellation queries at scale
User Intent Prediction in Information-seeking Conversations
Conversational assistants are being progressively adopted by the general
population. However, they are not capable of handling complicated
information-seeking tasks that involve multiple turns of information exchange.
Due to the limited communication bandwidth in conversational search, it is
important for conversational assistants to accurately detect and predict user
intent in information-seeking conversations. In this paper, we investigate two
aspects of user intent prediction in an information-seeking setting. First, we
extract features based on the content, structural, and sentiment
characteristics of a given utterance, and use classic machine learning methods
to perform user intent prediction. We then conduct an in-depth feature
importance analysis to identify key features in this prediction task. We find
that structural features contribute most to the prediction performance. Given
this finding, we construct neural classifiers to incorporate context
information and achieve better performance without feature engineering. Our
findings can provide insights into the important factors and effective methods
of user intent prediction in information-seeking conversations.Comment: Accepted to CHIIR 201
Open-domain web-based multiple document : question answering for list questions with support for temporal restrictors
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2015With the growth of the Internet, more people are searching for information on the Web. The combination of web growth and improvements in Information Technology has reignited the interest in Question Answering (QA) systems. QA is a type of information retrieval combined with natural language processing techniques that aims at finding answers to natural language questions. List questions have been widely studied in the QA field. These are questions that require a list of correct answers, making the task of correctly answering them more complex. In List questions, the answers may lie in the same document or spread over multiple documents. In the latter case, a QA system able to answer List questions has to deal with the fusion of partial answers. The current Question Answering state-of-the-art does not provide yet a good way to tackle this complex problem of collecting the exact answers from multiple documents. Our goal is to provide better QA solutions to users, who desire direct answers, using approaches that deal with the complex problem of extracting answers found spread over several documents. The present dissertation address the problem of answering Open-domain List questions by exploring redundancy and combining it with heuristics to improve QA accuracy. Our approach uses the Web as information source, since it is several orders of magnitude larger than other document collections. Besides handling List questions, we develop an approach with special focus on questions that include temporal information. In this regard, the current work addresses a topic that was lacking specific research. A additional purpose of this dissertation is to report on important results of the research combining Web-based QA, List QA and Temporal QA. Besides the evaluation of our approach itself we compare our system with other QA systems in order to assess its performance relative to the state-of-the-art. Finally, our approaches to answer List questions and List questions with temporal information are implemented into a fully-fledged Open-domain Web-based Question Answering System that provides answers retrieved from multiple documents.Com o crescimento da Internet cada vez mais pessoas buscam informações usando a Web. A combinação do crescimento da Internet com melhoramentos na Tecnologia da Informação traz como consequência o renovado interesse em Sistemas de Respostas a Perguntas (SRP). SRP combina técnicas de recuperação de informação com ferramentas de apoio à linguagem natural com o objetivo de encontrar respostas para perguntas em linguagem natural. Perguntas do tipo lista têm sido largamente estudadas nesta área. Neste tipo de perguntas é esperada uma lista de respostas corretas, o que torna a tarefa de responder a perguntas do tipo lista ainda mais complexa. As respostas para este tipo de pergunta podem ser encontradas num único documento ou espalhados em múltiplos documentos. No último caso, um SRP deve estar preparado para lidar com a fusão de respostas parciais. Os SRP atuais ainda não providenciam uma boa forma de lidar com este complexo problema de coletar respostas de múltiplos documentos. Nosso objetivo é prover melhores soluções para utilizadores que desejam buscar respostas diretas usando abordagens para extrair respostas de múltiplos documentos. Esta dissertação aborda o problema de responder a perguntas de domínio aberto explorando redundância combinada com heurísticas. Nossa abordagem usa a Internet como fonte de informação uma vez que a Web é a maior coleção de documentos da atualidade. Para além de responder a perguntas do tipo lista, nós desenvolvemos uma abordagem para responder a perguntas com restrição temporal. Neste sentido, o presente trabalho aborda este tema onde há pouca investigação específica. Adicionalmente, esta dissertação tem o propósito de informar sobre resultados importantes desta pesquisa que combina várias áreas: SRP com base na Web, SRP especialmente desenvolvidos para responder perguntas do tipo lista e também com restrição temporal. Além da avaliação da nossa própria abordagem, comparamos o nosso sistema com outros SRP, a fim de avaliar o seu desempenho em relação ao estado da arte. Por fim, as nossas abordagens para responder a perguntas do tipo lista e perguntas do tipo lista com informações temporais são implementadas em um Sistema online de Respostas a Perguntas de domínio aberto que funciona diretamente sob a Web e que fornece respostas extraídas de múltiplos documentos.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/65647/2009; European Commission, projeto QTLeap (Quality Translation by Deep Language Engineering Approache
On-the-fly Table Generation
Many information needs revolve around entities, which would be better
answered by summarizing results in a tabular format, rather than presenting
them as a ranked list. Unlike previous work, which is limited to retrieving
existing tables, we aim to answer queries by automatically compiling a table in
response to a query. We introduce and address the task of on-the-fly table
generation: given a query, generate a relational table that contains relevant
entities (as rows) along with their key properties (as columns). This problem
is decomposed into three specific subtasks: (i) core column entity ranking,
(ii) schema determination, and (iii) value lookup. We employ a feature-based
approach for entity ranking and schema determination, combining deep semantic
features with task-specific signals. We further show that these two subtasks
are not independent of each other and can assist each other in an iterative
manner. For value lookup, we combine information from existing tables and a
knowledge base. Using two sets of entity-oriented queries, we evaluate our
approach both on the component level and on the end-to-end table generation
task.Comment: The 41st International ACM SIGIR Conference on Research and
Development in Information Retrieva
Early aspects: aspect-oriented requirements engineering and architecture design
This paper reports on the third Early Aspects: Aspect-Oriented Requirements Engineering and Architecture Design Workshop, which has been held in Lancaster, UK, on March 21, 2004. The workshop included a presentation session and working sessions in which the particular topics on early aspects were discussed. The primary goal of the workshop was to focus on challenges to defining methodical software development processes for aspects from early on in the software life cycle and explore the potential of proposed methods and techniques to scale up to industrial applications
Helping to keep history relevant : mulitmedia and authentic learning
The subject based curriculum attracts lively debate in many countries being accused of fragmenting teaching and learning, erecting artificial barriers and failing to teach the skills required in the twenty first century (Hazlewood 2005). Cross-curricular rich tasks are increasingly seen as the means to develop relevant knowledge, understanding and skills. Over the past fourteen years we have developed and evaluated a series of interactive multi-media resources for primary and secondary schools on themes within Scottish History. The generally positive evaluation given to these resources by pupils and teachers points to some ways in which subjects such as history can remain challenging and relevant. The relevance has largely stemmed, in the case of the multi-media resources, from combining the historian's traditional role of problemising the past, with a wide range of primary and secondary sources, new technologies and learning tasks encompassing critical skills/authentic learning. Consequently, we argue that subjects must in future embrace new technologies and authentic learning to maintain their place in the school curriculum
Automatic Music Composition using Answer Set Programming
Music composition used to be a pen and paper activity. These these days music
is often composed with the aid of computer software, even to the point where
the computer compose parts of the score autonomously. The composition of most
styles of music is governed by rules. We show that by approaching the
automation, analysis and verification of composition as a knowledge
representation task and formalising these rules in a suitable logical language,
powerful and expressive intelligent composition tools can be easily built. This
application paper describes the use of answer set programming to construct an
automated system, named ANTON, that can compose melodic, harmonic and rhythmic
music, diagnose errors in human compositions and serve as a computer-aided
composition tool. The combination of harmonic, rhythmic and melodic composition
in a single framework makes ANTON unique in the growing area of algorithmic
composition. With near real-time composition, ANTON reaches the point where it
can not only be used as a component in an interactive composition tool but also
has the potential for live performances and concerts or automatically generated
background music in a variety of applications. With the use of a fully
declarative language and an "off-the-shelf" reasoning engine, ANTON provides
the human composer a tool which is significantly simpler, more compact and more
versatile than other existing systems. This paper has been accepted for
publication in Theory and Practice of Logic Programming (TPLP).Comment: 31 pages, 10 figures. Extended version of our ICLP2008 paper.
Formatted following TPLP guideline
- …