14,780 research outputs found
Automatic case acquisition from texts for process-oriented case-based reasoning
This paper introduces a method for the automatic acquisition of a rich case
representation from free text for process-oriented case-based reasoning. Case
engineering is among the most complicated and costly tasks in implementing a
case-based reasoning system. This is especially so for process-oriented
case-based reasoning, where more expressive case representations are generally
used and, in our opinion, actually required for satisfactory case adaptation.
In this context, the ability to acquire cases automatically from procedural
texts is a major step forward in order to reason on processes. We therefore
detail a methodology that makes case acquisition from processes described as
free text possible, with special attention given to assembly instruction texts.
This methodology extends the techniques we used to extract actions from cooking
recipes. We argue that techniques taken from natural language processing are
required for this task, and that they give satisfactory results. An evaluation
based on our implemented prototype extracting workflows from recipe texts is
provided.Comment: Sous presse, publication pr\'evue en 201
Query Expansion for Survey Question Retrieval in the Social Sciences
In recent years, the importance of research data and the need to archive and
to share it in the scientific community have increased enormously. This
introduces a whole new set of challenges for digital libraries. In the social
sciences typical research data sets consist of surveys and questionnaires. In
this paper we focus on the use case of social science survey question reuse and
on mechanisms to support users in the query formulation for data sets. We
describe and evaluate thesaurus- and co-occurrence-based approaches for query
expansion to improve retrieval quality in digital libraries and research data
archives. The challenge here is to translate the information need and the
underlying sociological phenomena into proper queries. As we can show retrieval
quality can be improved by adding related terms to the queries. In a direct
comparison automatically expanded queries using extracted co-occurring terms
can provide better results than queries manually reformulated by a domain
expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory
and Practice of Digital Libraries 2015 (TPDL 2015
Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web
The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very
instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that
they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our
technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet
the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the
services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge
management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The
combination of this expertise, and the time and space afforded the consortium by the
IRC structure, suggested the opportunity for a concerted effort to develop an approach
to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to
the knowledge management services AKT tries to provide. As a medium for the
semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the
provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different
applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing
ontologies to create a third). Ontology mapping, and the elimination of conflicts of
reference, will be important tasks. All of these issues are discussed along with our
proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices
that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which
semantic hygiene prevails interesting enough to reason in? These and many other
questions need to be addressed if we are to provide effective knowledge technologies
for our content on the web
Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts
Understanding how ideas relate to each other is a fundamental question in
many domains, ranging from intellectual history to public communication.
Because ideas are naturally embedded in texts, we propose the first framework
to systematically characterize the relations between ideas based on their
occurrence in a corpus of documents, independent of how these ideas are
represented. Combining two statistics --- cooccurrence within documents and
prevalence correlation over time --- our approach reveals a number of different
ways in which ideas can cooperate and compete. For instance, two ideas can
closely track each other's prevalence over time, and yet rarely cooccur, almost
like a "cold war" scenario. We observe that pairwise cooccurrence and
prevalence correlation exhibit different distributions. We further demonstrate
that our approach is able to uncover intriguing relations between ideas through
in-depth case studies on news articles and research papers.Comment: 11 pages, 9 figures, to appear in Proceedings of ACL 2017, code and
data available at https://chenhaot.com/pages/idea-relations.html (fixed a
typo
- …