5 research outputs found

    ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease

    Get PDF
    Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active learning. This is achieved through selective query of challenging samples for labeling. To the best of our knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The training is performed once more using the samples labeled so far. The interleaved phases of labeling and training are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the three main coronary arteries as a sample label and considering the two remaining arteries as sample features

    Sistema de detección y representación de expresiones de tiempo en textos no estructurados

    Get PDF
    Dada la necesidad de herramientas capaces de automatizar la gestión del tiempo, en este proyecto fin de carrera se propone una solución a la problemática de la extracción y representación del tiempo en el lenguaje escrito; concretamente, el objetivo consiste en diseñar una aplicación que mediante el uso de gramáticas sea capaz de reconocer expresiones temporales en documentos electrónicos, para posteriormente recoger su significado, en un formato normalizado que favorezca la inserción de las correspondientes marcas temporales en un calendario. Para llevar a cabo todo esto será necesario realizar primeramente un estudio del estado del arte de los sistemas que se dedican al tratamiento de información temporal. Además, se necesita conocer los modelos temporales existentes hasta el momento que sirven de base a dichos sistemas. Adicionalmente, se deberá diseñar una interfaz que le permita al usuario interactuar de manera sencilla, por lo que será lo más amigable e intuitiva posible. A grandes rasgos la herramienta a desarrollar debería ser capaz de resolver las siguientes tareas: • Fase de reconocimiento. El sistema permitirá la detección de las expresiones temporales más frecuentes en castellano a partir del análisis de textos. La aplicación será capaz de delimitar su extensión mediante marcas fácilmente reconocibles e intuitivas. • Fase de normalización. El sistema ofrecerá la posibilidad de resolver y extraer el significado de las expresiones detectadas en un texto, siendo éstas marcadas en algún formato normalizado. • Fase de representación. Otra tarea que debe permitir realizar el sistema consiste en la representación de la semántica temporal de las expresiones previamente tratadas gracias a alguna herramienta de visualización, por ejemplo, en un calendario. Por último, aunque la aplicación principal del sistema sea la detección de expresiones temporales en textos electrónicos y su posterior representación en un calendario, su alcance no tiene por qué limitarse en este punto. Se hace por tanto deseable, que se facilite la integración del sistema desarrollado en aplicaciones futuras, de cara a su explotación en otras tareas del Procesamiento del Lenguaje Natural, como son la Búsqueda de Respuestas, Extracción de Información, etc.Ingeniería Técnica en Informática de Gestió

    Temporal processing of news : annotation of temporal expressions, verbal events and temporal relations

    Get PDF
    The ability to capture the temporal dimension of a natural language text is essential to many natural language processing applications, such as Question Answering, Automatic Summarisation, and Information Retrieval. Temporal processing is a ¯eld of Computational Linguistics which aims to access this dimension and derive a precise temporal representation of a natural language text by extracting time expressions, events and temporal relations, and then representing them according to a chosen knowledge framework. This thesis focuses on the investigation and understanding of the di®erent ways time is expressed in natural language, on the implementation of a temporal processing system in accordance with the results of this investigation, on the evaluation of the system, and on the extensive analysis of the errors and challenges that appear during system development. The ultimate goal of this research is to develop the ability to automatically annotate temporal expressions, verbal events and temporal relations in a natural language text. Temporal expression annotation involves two stages: temporal expression identi¯cation concerned with determining the textual extent of a temporal expression, and temporal expression normalisation which ¯nds the value that the temporal expression designates and represents it using an annotation standard. The research presented in this thesis approaches these tasks with a knowledge-based methodology that tackles temporal expressions according to their semantic classi¯cation. Several knowledge sources and normalisation models are experimented with to allow an analysis of their impact on system performance. The annotation of events expressed using either ¯nite or non-¯nite verbs is addressed with a method that overcomes the drawback of existing methods v which associate an event with the class that is most frequently assigned to it in a corpus and are limited in coverage by the small number of events present in the corpus. This limitation is overcome in this research by annotating each WordNet verb with an event class that best characterises that verb. This thesis also describes an original methodology for the identi¯cation of temporal relations that hold among events and temporal expressions. The method relies on sentence-level syntactic trees and a propagation of temporal relations between syntactic constituents, by analysing syntactic and lexical properties of the constituents and of the relations between them. The detailed evaluation and error analysis of the methods proposed for solving di®erent temporal processing tasks form an important part of this research. Various corpora widely used by researchers studying di®erent temporal phenomena are employed in the evaluation, thus enabling comparison with state of the art in the ¯eld. The detailed error analysis targeting each temporal processing task helps identify not only problems of the implemented methods, but also reliability problems of the annotated resources, and encourages potential reexaminations of some temporal processing tasks.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Automatic Time Expression Labeling for English and Chinese Text

    No full text
    Abstract. In this paper, we describe systems for automatic labeling of time expressions occurring in English and Chinese text as specified in the ACE Temporal Expression Recognition and Normalization (TERN) task. We cast the chunking of text into time expressions as a tagging problem using a bracketed representation at token level, which takes into account embedded constructs. We adopted a left-to-right, token-by-token, discriminative, deterministic classification scheme to determine the tags for each token. A number of features are created from a predefined context centered at each token and augmented with decisions from a rule-based time expression tagger and/or a statistical time expression tagger trained on different type of text data, assuming they provide complementary information. We trained one-versus-all multi-class classifiers using support vector machines. We participated in the TERN 2004 recognition task and achieved competitive results.
    corecore