2,964 research outputs found

    Unsupervised Extraction of Representative Concepts from Scientific Literature

    Full text link
    This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human effort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201

    Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

    Get PDF
    Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

    SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling

    Full text link
    For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e. Semantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL.Comment: Published in NAACL 201

    Improving Hypernymy Extraction with Distributional Semantic Classes

    Full text link
    In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japa

    Automatic extraction of robotic surgery actions from text and kinematic data

    Get PDF
    The latest generation of robotic systems is becoming increasingly autonomous due to technological advancements and artificial intelligence. The medical field, particularly surgery, is also interested in these technologies because automation would benefit surgeons and patients. While the research community is active in this direction, commercial surgical robots do not currently operate autonomously due to the risks involved in dealing with human patients: it is still considered safer to rely on human surgeons' intelligence for decision-making issues. This means that robots must possess human-like intelligence, including various reasoning capabilities and extensive knowledge, to become more autonomous and credible. As demonstrated by current research in the field, indeed, one of the most critical aspects in developing autonomous systems is the acquisition and management of knowledge. In particular, a surgical robot must base its actions on solid procedural surgical knowledge to operate autonomously, safely, and expertly. This thesis investigates different possibilities for automatically extracting and managing knowledge from text and kinematic data. In the first part, we investigated the possibility of extracting procedural surgical knowledge from real intervention descriptions available in textbooks and academic papers on the robotic-surgical domains, by exploiting Transformer-based pre-trained language models. In particular, we released SurgicBERTa, a RoBERTa-based pre-trained language model for surgical literature understanding. It has been used to detect procedural sentences in books and extract procedural elements from them. Then, with some use cases, we explored the possibilities of translating written instructions into logical rules usable for robotic planning. Since not all the knowledge required for automatizing a procedure is written in texts, we introduce the concept of surgical commonsense, showing how it relates to different autonomy levels. In the second part of the thesis, we analyzed surgical procedures from a lower granularity level, showing how each surgical gesture is associated with a given combination of kinematic data

    A comparative analysis of recommender systems based on item aspect opinions extracted from user reviews

    Full text link
    In popular applications such as e-commerce sites and social media, users provide online reviews giving personal opinions about a wide array of items, such as products, services and people. These reviews are usually in the form of free text, and represent a rich source of information about the users’ preferences. Among the information elements that can be extracted from reviews, opinions about particular item aspects (i.e., characteristics, attributes or components) have been shown to be effective for user modeling and personalized recommendation. In this paper, we investigate the aspect-based recommendation problem by separately addressing three tasks, namely identifying references to item aspects in user reviews, classifying the sentiment orientation of the opinions about such aspects in the reviews, and exploiting the extracted aspect opinion information to provide enhanced recommendations. Differently to previous work, we integrate and empirically evaluate several state-of-the-art and novel methods for each of the above tasks. We conduct extensive experiments on standard datasets and several domains, analyzing distinct recommendation quality metrics and characteristics of the datasets, domains and extracted aspects. As a result of our investigation, we not only derive conclusions about which combination of methods is most appropriate according to the above issues, but also provide a number of valuable resources for opinion mining and recommendation purposes, such as domain aspect vocabularies and domain-dependent, aspect-level lexiconsThis work was supported by the Spanish Ministry of Economy, Industry and Competitiveness (TIN2016-80630-P)

    Extracting, managing, and exploiting the semantics of mechanical CAD models in assembly tasks

    Get PDF
    The manufacturing of mechanical products is increasingly assisted by technologies that exploit the CAD model of the final assembly to address complex tasks in an automated and simplified way, to reduce development time and costs. However, it is proven that industrial CAD models are heterogeneous objects, involving different design conventions, providing geometric data on parts but often lacking explicit semantic information on their functionalities. As a consequence, existing approaches are mainly mathematics-based or need expert intervention to interpret assembly components, and this is limiting. The work presented in the thesis is placed in this context and aims at automatically extracting and leveraging in industrial applications high-level semantic information from B-rep models of mechanical products in standard format (e.g. STEP). This makes possible the development of promising knowledge intensive processes that take into account the engineering meaning of the parts and their relationships. The guiding idea is to define a rule-based approach that matches the shape features, the dimensional relations, and the mounting schemes strictly governing real mechanical assemblies with the geometric and topological properties that can be retrieved in CAD models of assemblies. More in practice, a standalone system is implemented which carries out two distinct operations, namely the data extraction and the data exploitation. The first involves all the steps necessary to process and analyze the geometric objects representing the parts of the assembly to infer their engineering meaning. It returns an enriched product model representation based on a new data structure, denoted as liaison, containing all the extracted information. The new product model representation, then, stands at the basis of the data exploitation phase, where assembly tasks, such as subassembly identification, assembly planning, and design for assembly, are addressed in a more effective way

    Prerequisites for Affective Signal Processing (ASP)

    Get PDF
    Although emotions are embraced by science, their recognition has not reached a satisfying level. Through a concise overview of affect, its signals, features, and classification methods, we provide understanding for the problems encountered. Next, we identify the prerequisites for successful Affective Signal Processing: validation (e.g., mapping of constructs on signals), triangulation, a physiology-driven approach, and contributions of the signal processing community. Using these directives, a critical analysis of a real-world case is provided. This illustrates that the prerequisites can become a valuable guide for Affective Signal Processing (ASP)

    Event Detection in Videos

    Get PDF
    • …
    corecore