Search CORE

387 research outputs found

A literature survey of active machine learning in the context of natural language processing

Author: Olsson Fredrik
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2009
Field of study

Active learning is a supervised machine learning technique in which the learner is in control of the data used for learning. That control is utilized by the learner to ask an oracle, typically a human with extensive knowledge of the domain at hand, about the classes of the instances for which the model learned so far makes unreliable predictions. The active learning process takes as input a set of labeled examples, as well as a larger set of unlabeled examples, and produces a classifier and a relatively small set of newly labeled data. The overall goal is to create as good a classifier as possible, without having to mark-up and supply the learner with more data than necessary. The learning process aims at keeping the human annotation effort to a minimum, only asking for advice where the training utility of the result of such a query is high. Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation. This report is a literature survey of active learning from the perspective of natural language processing

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare

Author: Hoetzlein Rama
Hou Yu
Li Mingchen
Liu Ying
Wang Fang
Wang Haozhu
Zhang Rui
Zhou Huixue
Zhou Sicheng
Publication venue
Publication date: 23/10/2023
Field of study

Reinforcement learning (RL) has emerged as a powerful approach for tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed

arXiv.org e-Print Archive

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Author: Ambrogi Federico
Benner Axel
Binder Harald
Boulesteix Anne-Laure
De Bin Riccardo
Lusa Lara
McShane Lisa
Michiels Stefan
Migliavacca Eugenia
Rahnenführer Jörg
Sauerbrei Willi
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2023
Field of study

International audienceBackground: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses

Open Access LMU

HAL UVSQ

Event Extraction: A Survey

Author: Lai Viet Dac
Publication venue
Publication date: 10/10/2022
Field of study

Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.Comment: 20 page

arXiv.org e-Print Archive

A CASE-BASED REASONING SYSTEM FOR THE DIAGNOSIS OF INDIVIDUAL SENSITIVITY TO STRESS IN PSYCHOPHYSIOLOGY

Author: Shahina Begum
Publication venue
Publication date: 02/04/2020
Field of study

Abstract Stress is an increasing problem in our present world. Especially negative stress could cause serious health problems if it remains undiagnosed/misdiagnosed and untreated. In stress medicine, clinicians' measure blood pressure, ECG, finger temperature and breathing rate during a number of exercises to diagnose stressrelated disorders. One of the physiological parameters for quantifying stress levels is the finger temperature measurement which helps the clinicians in diagnosis and treatment of stress. However, in practice, it is difficult and tedious for a clinician to understand, interpret and analyze complex, lengthy sequential sensor signals. There are only few experts who are able to diagnose and predict stress-related problems. A system that can help the clinician in diagnosing stress is important, but the large individual variations make it difficult to build such a system. This research work has investigated several artificial Intelligence techniques for the purpose of developing an intelligent, integrated sensor system for establishing diagnosis and treatment plan in the psychophysiological domain. To diagnose individual sensitivity to stress, case-based reasoning is applied as a core technique to facilitate experience reuse by retrieving previous similar cases. Furthermore, fuzzy techniques are also employed and incorporated into the case-based reasoning system to handle vagueness, uncertainty inherently existing in clinicians reasoning process. The validation of the approach is based on close collaboration with experts and measurements from twenty four persons used as reference. 39 time series from these 24 persons have been used to evaluate the approach (in terms of the matching algorithms) and an expert has ranked and estimated the similarity. The result shows that the system reaches a level of performance close to an expert. The proposed system could be used as an expert for a less experienced clinician or as a second option for an experienced clinician to their decision making process in stress diagnosis. Sammanfattning Den ökande stressnivån i vårt samhälle med allt högre krav och högt tempo har ett högt pris. Stressrelaterade problem och sjukdom är en stor samhällskostnad och speciellt om negativ stress förblir oupptäckt, eller ej korrekt identifierad/diagnostiserad och obehandlad under en längre tid kan den få alvarliga hälsoeffekter för individen vilket kan leda till långvarig sjukskrivning. Inom stressmedicinen mäter kliniker blodtryck, EKG, fingertemperatur och andning under olika situationer för att diagnostisera stress. Stressdiagnos baserat fingertemperaturen (FT) är något som en skicklig klinker kan utföra vilket stämmer med forskningen inom klinisk psykofysiologi. Emellertid i praktiken är det mycket svårt, och mödosamt för att en kliniker att i detalj följa och analysera långa serier av mätvärden och det finns endast mycket få experter som är kompetent att diagnostisera och/eller förutsäga stressproblem. Därför är ett system, som kan hjälpa kliniker i diagnostisering av stress, viktig. Men de stora individvariationerna och bristen av precisa diagnosregler gör det svårt att använda ett datorbaserat system. Detta forskningsarbete har tittat på flera tekniker och metoder inom artificiell intelligens för att hitta en väg fram till ett intelligent sensorbaserat system för diagnos och utformning av behandlingsplaner inom stressområdet. För att diagnostisera individuell stress har fallbaserat resonerande visat sig framgångsrikt, en teknik som gör det möjligt att återanvända erfarenhet, förklara beslut, genom att hämta tidigare liknande fingertemperaturprofilerar. Vidare används "fuzzy logic", luddig logik så att systemet kan hantera de inneboende vagheter i domänen. Metoder och algoritmer har utvecklats för detta. Valideringen av ansatsen baseras på nära samarbete med experter och mätningar från tjugofyra användare. Trettionio tidserier från dessa 24 personer har varit basen för utvärderingen av ansatsen, och en erfaren kliniker har klassificerat alla fall och systemet har visat sig producera resultat nära en expert. Det föreslagna systemet kan användas som ett referens för en mindre erfaren kliniker eller som ett "second opinion" för en erfaren kliniker i deras beslutsprocess. Dessutom har finger temperatur visat sig passa bra för användning i hemmet vid träning eller kontroll vilket blir möjligt med ett datorbaserat stressklassificeringssystem på exempelvis en PC med en USB fingertemperaturmätare. vii Acknowledgemen

CiteSeerX

Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora

Author: Olsson Fredrik
Publication venue
Publication date: 01/01/2008
Field of study

This thesis describes the development and in-depth empirical investigation of a method, called BootMark, for bootstrapping the marking up of named entities in textual documents. The reason for working with documents, as opposed to for instance sentences or phrases, is that the BootMark method is concerned with the creation of corpora. The claim made in the thesis is that BootMark requires a human annotator to manually annotate fewer documents in order to produce a named entity recognizer with a given performance, than would be needed if the documents forming the basis for the recognizer were randomly drawn from the same corpus. The intention is then to use the created named en- tity recognizer as a pre-tagger and thus eventually turn the manual annotation process into one in which the annotator reviews system-suggested annotations rather than creating new ones from scratch. The BootMark method consists of three phases: (1) Manual annotation of a set of documents; (2) Bootstrapping – active machine learning for the purpose of selecting which document to an- notate next; (3) The remaining unannotated documents of the original corpus are marked up using pre-tagging with revision. Five emerging issues are identified, described and empirically investigated in the thesis. Their common denominator is that they all depend on the real- ization of the named entity recognition task, and as such, require the context of a practical setting in order to be properly addressed. The emerging issues are related to: (1) the characteristics of the named entity recognition task and the base learners used in conjunction with it; (2) the constitution of the set of documents annotated by the human annotator in phase one in order to start the bootstrapping process; (3) the active selection of the documents to annotate in phase two; (4) the monitoring and termination of the active learning carried out in phase two, including a new intrinsic stopping criterion for committee-based active learning; and (5) the applicability of the named entity recognizer created during phase two as a pre-tagger in phase three. The outcomes of the empirical investigations concerning the emerging is- sues support the claim made in the thesis. The results also suggest that while the recognizer produced in phases one and two is as useful for pre-tagging as a recognizer created from randomly selected documents, the applicability of the recognizer as a pre-tagger is best investigated by conducting a user study involving real annotators working on a real named entity recognition task

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Göteborgs universitets publikationer - e-publicering och e-arkiv

Semantically intelligent semi-automated ontology integration

Author: Umer Qasim
Publication venue
Publication date: 01/01/2012
Field of study

An ontology is a way of information categorization and storage. Web Ontologies provide help in retrieving the required and precise information over the web. However, the problem of heterogeneity between ontologies may occur in the use of multiple ontologies of the same domain. The integration of ontologies provides a solution for the heterogeneity problem. Ontology integration is a solution to problem of interoperability in the knowledge based systems. Ontology integration provides a mechanism to find the semantic association between a pair of reference ontologies based on their concepts. Many researchers have been working on the problem of ontology integration; however, multiple issues related to ontology integration are still not addressed. This dissertation involves the investigation of the ontology integration problem and proposes a layer based enhanced framework as a solution to the problem. The comparison between concepts of reference ontologies is based on their semantics along with their syntax in the concept matching process of ontology integration. The semantic relationship of a concept with other concepts between ontologies and the provision of user confirmation (only for the problematic cases) are also taken into account in this process. The proposed framework is implemented and validated by providing a comparison of the proposed concept matching technique with the existing techniques. The test case scenarios are provided in order to compare and analyse the proposed framework in the analysis phase. The results of the experiments completed demonstrate the efficacy and success of the proposed framework

Repository@Hull - Worktribe