178,957 research outputs found

    Annotated dataset creation through large language models for non-english medical NLP

    Get PDF
    Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for tasks often requires custom-designed datasets to address NLP tasks in a supervised machine learning fashion. When operating in non-English languages for medical data processing, this exposes several minor and major, interconnected problems such as the lack of task-matching datasets as well as task-specific pre-trained models. In our work, we suggest to leverage pre-trained large language models for training data acquisition in order to retrieve sufficiently large datasets for training smaller and more efficient models for use-case-specific tasks. To demonstrate the effectiveness of your approach, we create a custom dataset that we use to train a medical NER model for German texts, GPTNERMED, yet our method remains language-independent in principle. Our obtained dataset as well as our pre-trained models are publicly available at https://github.com/frankkramer-lab/GPTNERMED

    A conceptual architecture for interactive educational multimedia

    Get PDF
    Learning is more than knowledge acquisition; it often involves the active participation of the learner in a variety of knowledge- and skills-based learning and training activities. Interactive multimedia technology can support the variety of interaction channels and languages required to facilitate interactive learning and teaching. A conceptual architecture for interactive educational multimedia can support the development of such multimedia systems. Such an architecture needs to embed multimedia technology into a coherent educational context. A framework based on an integrated interaction model is needed to capture learning and training activities in an online setting from an educational perspective, to describe them in the human-computer context, and to integrate them with mechanisms and principles of multimedia interaction

    CML: the commonKADS conceptual modelling language

    Get PDF
    We present a structured language for the specification of knowledge models according to the CommonKADS methodology. This language is called CML (Conceptual Modelling Language) and provides both a structured textual notation and a diagrammatic notation for expertise models. The use of our CML is illustrated by a variety of examples taken from the VT elevator design system

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Towards using web-crawled data for domain adaptation in statistical machine translation

    Get PDF
    This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-specific data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language pairs: English–French and English–Greek

    Acquiring Correct Knowledge for Natural Language Generation

    Full text link
    Natural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    THE "POWER" OF TEXT PRODUCTION ACTIVITY IN COLLABORATIVE MODELING : NINE RECOMMENDATIONS TO MAKE A COMPUTER SUPPORTED SITUATION WORK

    Get PDF
    Language is not a direct translation of a speaker’s or writer’s knowledge or intentions. Various complex processes and strategies are involved in serving the needs of the audience: planning the message, describing some features of a model and not others, organizing an argument, adapting to the knowledge of the reader, meeting linguistic constraints, etc. As a consequence, when communicating about a model, or about knowledge, there is a complex interaction between knowledge and language. In this contribution, we address the question of the role of language in modeling, in the specific case of collaboration over a distance, via electronic exchange of written textual information. What are the problems/dimensions a language user has to deal with when communicating a (mental) model? What is the relationship between the nature of the knowledge to be communicated and linguistic production? What is the relationship between representations and produced text? In what sense can interactive learning systems serve as mediators or as obstacles to these processes
    corecore