2,752 research outputs found

    Automatic case acquisition from texts for process-oriented case-based reasoning

    Get PDF
    This paper introduces a method for the automatic acquisition of a rich case representation from free text for process-oriented case-based reasoning. Case engineering is among the most complicated and costly tasks in implementing a case-based reasoning system. This is especially so for process-oriented case-based reasoning, where more expressive case representations are generally used and, in our opinion, actually required for satisfactory case adaptation. In this context, the ability to acquire cases automatically from procedural texts is a major step forward in order to reason on processes. We therefore detail a methodology that makes case acquisition from processes described as free text possible, with special attention given to assembly instruction texts. This methodology extends the techniques we used to extract actions from cooking recipes. We argue that techniques taken from natural language processing are required for this task, and that they give satisfactory results. An evaluation based on our implemented prototype extracting workflows from recipe texts is provided.Comment: Sous presse, publication pr\'evue en 201

    The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

    Full text link
    Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW) at ACL 201

    Recipe instruction semantics corpus (RISeC) : resolving semantic structure and zero anaphora in recipes

    Get PDF
    We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructionsin MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation (e.g., ProPara). We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps). Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extractionas well an identification of zero anaphora. These basic building blocks can facilitate more advanced downstream applications (e.g., question answering, conversational agents)

    Data Mining a Medieval Medical Text Reveals Patterns in Ingredient Choice That Reflect Biological Activity against Infectious Agents

    Get PDF
    We used established methodologies from network science to identify patterns in medicinal ingredient combinations in a key medieval text, the 15th-century Lylye of Medicynes, focusing on recipes for topical treatments for symptoms of microbial infection. We conducted experiments screening the antimicrobial activity of selected ingredients. These experiments revealed interesting examples of ingredients that potentiated or interfered with each other’s activity and that would be useful bases for future, more detailed experiments. Our results highlight (i) the potential to use methodologies from network science to analyze medieval data sets and detect patterns of ingredient combination, (ii) the potential of interdisciplinary collaboration to reveal different aspects of the ethnopharmacology of historical medical texts, and (iii) the potential development of novel therapeutics inspired by premodern remedies in a time of increased need for new antibiotics.The pharmacopeia used by physicians and laypeople in medieval Europe has largely been dismissed as placebo or superstition. While we now recognize that some of the materia medica used by medieval physicians could have had useful biological properties, research in this area is limited by the labor-intensive process of searching and interpreting historical medical texts. Here, we demonstrate the potential power of turning medieval medical texts into contextualized electronic databases amenable to exploration by the use of an algorithm. We used established methodologies from network science to reveal patterns in ingredient selection and usage in a key text, the 15th-century Lylye of Medicynes, focusing on remedies to treat symptoms of microbial infection. In providing a worked example of data-driven textual analysis, we demonstrate the potential of this approach to encourage interdisciplinary collaboration and to shine a new light on the ethnopharmacology of historical medical texts
    • …
    corecore