4,227 research outputs found
Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition
such as Figures, Tables, Deļ¬nitions, Algo- rithms, etc., which are called Knowledge Cells hereafter. An advanced academic search engine which could take advantage of Knowledge Cells and their various relation- ships to obtain more accurate search results is expected. Further, itās expected to provide a ļ¬ne-grained search regard- ing to Knowledge Cells for deep-level information discovery and exploration. Therefore, it is important to identify and extract the Knowledge Cells and their various relationships which are often intrinsic and implicit in articles. With the exponential growth of scientiļ¬c publications, discovery and acquisition of such useful academic knowledge impose some practical challenges For example, existing algorithmic meth- ods can hardly extend to handle diverse layouts of journals, nor to scale up to process massive documents. As crowd- sourcing has become a powerful paradigm for large scale problem-solving especially for tasks that are difļ¬cult for computers but easy for human, we consider the problem of academic knowledge discovery and acquisition as a crowd- sourced database problem and show a hybrid framework to integrate the accuracy of crowdsourcing workers and the speed of automatic algorithms. In this paper, we introduce our current system implementation, a platform for academic knowledge discovery and acquisition (PANDA), as well as some interesting observations and promising future directions.Peer reviewe
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Automatically evaluating the quality of dialogue responses for unstructured
domains is a challenging problem. Unfortunately, existing automatic evaluation
metrics are biased and correlate very poorly with human judgements of response
quality. Yet having an accurate automatic evaluation procedure is crucial for
dialogue research, as it allows rapid prototyping and testing of new models
with fewer expensive human evaluations. In response to this challenge, we
formulate automatic dialogue evaluation as a learning problem. We present an
evaluation model (ADEM) that learns to predict human-like scores to input
responses, using a new dataset of human response scores. We show that the ADEM
model's predictions correlate significantly, and at a level much higher than
word-overlap metrics such as BLEU, with human judgements at both the utterance
and system-level. We also show that ADEM can generalize to evaluating dialogue
models unseen during training, an important step for automatic dialogue
evaluation.Comment: ACL 201
TiFi: Taxonomy Induction for Fictional Domains [Extended version]
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications
In model-driven engineering (MDE), UML class diagrams serve as a way to plan
and communicate between developers. However, it is complex and
resource-consuming. We propose an automated approach for the extraction of UML
class diagrams from natural language software specifications. To develop our
approach, we create a dataset of UML class diagrams and their English
specifications with the help of volunteers. Our approach is a pipeline of steps
consisting of the segmentation of the input into sentences, the classification
of the sentences, the generation of UML class diagram fragments from sentences,
and the composition of these fragments into one UML class diagram. We develop a
quantitative testing framework specific to UML class diagram extraction. Our
approach yields low precision and recall but serves as a benchmark for future
research.Comment: 8 pages, 7 tables, 9 figures, 2 algorithms, to be published in MODELS
'22 Companio
Open semantic service networks
Online service marketplaces will soon be part of the economy to scale the provision of specialized multi-party services through automation and standardization. Current research, such as the *-USDL service description language family, is already deļ¬ning the basic building blocks to model the next generation of business services. Nonetheless, the developments being made do not target to interconnect services via service relationships. Without the concept of relationship, marketplaces will be seen as mere functional silos containing service descriptions. Yet, in real economies, all services are related and connected. Therefore, to address this gap we introduce the concept of open semantic service network (OSSN), concerned with the establishment of rich relationships between services. These networks will provide valuable knowledge on the global service economy, which can be exploited for many socio-economic and scientiļ¬c purposes such as service network analysis, management, and control
- ā¦