364 research outputs found

    Improved label noise identification by exploiting unlabeled data

    Get PDF
    © 2017 IEEE. In machine learning, the available training samples are not always perfect and some labels can be corrupted which are called label noises. This may cause the reduction of accuracy. Meanwhile it will also increase the complexity of model. To mitigate the detrimental effect of label noises, noise filtering has been widely used which tries to identify label noises and remove them prior to learning. Almost all existing works only focus on the mislabeled training dataset and ignore the existence of unlabeled data. In fact, unlabeled data are easily accessible in many applications. In this work, we explore how to utilize these unlabeled data to increase the noise filtering effect. To this end, we have proposed a method named MFUDCM (Multiple Filtering with the aid of Unlabeled Data using Confidence Measurement). This method applies the novel multiple soft majority voting idea to make use unlabeled data. In addition, MFUDCM is expected to have a higher accuracy of identifying mislabeled data by using the concept of multiple voting. Finally, the validity of the proposed method MFUDCM is confirmed by experiments and the comparison results with other methods

    BI-LAVA: Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis

    Full text link
    In the biomedical domain, taxonomies organize the acquisition modalities of scientific images in hierarchical structures. Such taxonomies leverage large sets of correct image labels and provide essential information about the importance of a scientific publication, which could then be used in biocuration tasks. However, the hierarchical nature of the labels, the overhead of processing images, the absence or incompleteness of labeled data, and the expertise required to label this type of data impede the creation of useful datasets for biocuration. From a multi-year collaboration with biocurators and text-mining researchers, we derive an iterative visual analytics and active learning strategy to address these challenges. We implement this strategy in a system called BI-LAVA Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis. BI-LAVA leverages a small set of image labels, a hierarchical set of image classifiers, and active learning to help model builders deal with incomplete ground-truth labels, target a hierarchical taxonomy of image modalities, and classify a large pool of unlabeled images. BI-LAVA's front end uses custom encodings to represent data distributions, taxonomies, image projections, and neighborhoods of image thumbnails, which help model builders explore an unfamiliar image dataset and taxonomy and correct and generate labels. An evaluation with machine learning practitioners shows that our mixed human-machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy, as well as validating and improving data quality in labeled and unlabeled collections.Comment: 15 pages, 6 figure

    Patient and Sample Identification. out of the Maze?

    Get PDF
    Background: Patient and sample misidentification may cause significant harm or discomfort to the patients, especially when incorrect data is used for performing specific healthcare activities. It is hence obvious that efficient and quality care can only start from accurate patient identification. There are many opportunities for misidentification in healthcare and laboratory medicine, including homonymy, incorrect patient registration, reliance on wrong patient data, mistakes in order entry, collection of biological specimens from wrong patients, inappropriate sample labeling and inaccurate entry or erroneous transmission of test results through the laboratory information system. Many ongoing efforts are made to prevent this important healthcare problem, entailing streamlined strategies for identifying patients throughout the healthcare industry by means of traditional and innovative identifiers, as well as using technologic tools that may enhance both the quality and efficiency of blood tubes labeling. The aim of this article is to provide an overview about the liability of identification errors in healthcare, thus providing a pragmatic approach for diverging the so-called patient identification crisis

    Enhanced label noise filtering with multiple voting

    Get PDF
    © 2019 by the authors. Label noises exist in many applications, and their presence can degrade learning performance. Researchers usually use filters to identify and eliminate them prior to training. The ensemble learning based filter (EnFilter) is the most widely used filter. According to the voting mechanism, EnFilter is mainly divided into two types: single-voting based (SVFilter) and multiple-voting based (MVFilter). In general, MVFilter is more often preferred because multiple-voting could address the intrinsic limitations of single-voting. However, the most important unsolved issue in MVFilter is how to determine the optimal decision point (ODP). Conceptually, the decision point is a threshold value, which determines the noise detection performance. To maximize the performance of MVFilter, we propose a novel approach to compute the optimal decision point. Our approach is data driven and cost sensitive, which determines the ODP based on the given noisy training dataset and noise misrecognition cost matrix. The core idea of our approach is to estimate the mislabeled data probability distributions, based on which the expected cost of each possible decision point could be inferred. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach

    Segmentation and Recognition of Eating Gestures from Wrist Motion Using Deep Learning

    Get PDF
    This research considers training a deep learning neural network for segmenting and classifying eating related gestures from recordings of subjects eating unscripted meals in a cafeteria environment. It is inspired by the recent trend of success in deep learning for solving a wide variety of machine related tasks such as image annotation, classification and segmentation. Image segmentation is a particularly important inspiration, and this work proposes a novel deep learning classifier for segmenting time-series data based on the work done in [25] and [30]. While deep learning has established itself as the state-of-the-art approach in image segmentation, particularly in works such as [2],[25] and [31], very little work has been done for segmenting time-series data using deep learning models. Wrist mounted IMU sensors such as accelerometers and gyroscopes can record activity from a subject in a free-living environment, while being encapsulated in a watch-like device and thus being inconspicuous. Such a device can be used to monitor eating related activities as well, and is thought to be useful for monitoring energy intake for healthy individuals as well as those afflicted with conditions such as being overweight or obese. The data set that is used for this research study is known as the Clemson Cafeteria Dataset, available publicly at [14]. It contains data for 276 people eating a meal at the Harcombe Dining Hall at Clemson University, which is a large cafeteria environment. The data includes wrist motion measurements (accelerometer x, y, z; gyroscope yaw, pitch, roll) recorded when the subjects each ate an unscripted meal. Each meal consisted of 1-4 courses, of which 488 were used as part of this research. The ground truth labelings of gestures were created by a set of 18 trained human raters, and consist of labels such as ’bite’ used to indicate when the subject starts to put food in their mouth, and later moves the hand away for more ’bites’ or other activities. Other labels include ’drink’ for liquid intake, ’rest’ for stationary hands and ’utensiling’ for actions such as cutting the food into bite size pieces, stirring a liquid or dipping food in sauce among other things. All other activities are labeled as ’other’ by the human raters. Previous work in our group focused on recognizing these gesture types from manually segmented data using hidden Markov models [24],[27]. This thesis builds on that work, by considering a deep learning classifier for automatically segmenting and recognizing gestures. The neural network classifier proposed as part of this research performs satisfactorily well at recognizing intake gestures, with 79.6% of ’bite’ and 80.7% of ’drink’ gestures being recognized correctly on average per meal. Overall 77.7% of all gestures were recognized correctly on average per meal, indicating that a deep learning classifier can successfully be used to simultaneously segment and identify eating gestures from wrist motion measured through IMU sensors

    Multi-Instance Learning Models for Automated Support of Analysts in Simulated Surveillance Environments

    Get PDF
    New generations of surveillance drones are being outfitted with numerous high definition cameras. The rapid proliferation of fielded sensors and supporting capacity for processing and displaying data will translate into ever more capable platforms, but with increased capability comes increased complexity and scale that may diminish the usefulness of such platforms to human operators. We investigate methods for alleviating strain on analysts by automatically retrieving content specific to their current task using a machine learning technique known as Multi-Instance Learning (MIL). We use MIL to create a real time model of the analysts' task and subsequently use the model to dynamically retrieve relevant content. This paper presents results from a pilot experiment in which a computer agent is assigned analyst tasks such as identifying caravanning vehicles in a simulated vehicle traffic environment. We compare agent performance between MIL aided trials and unaided trials

    Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise

    Full text link
    Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise

    SPECIMEN LABELING IMPROVEMENT PROJECT: SLIP

    Get PDF
    Blood specimens are labeled at the time of acquisition in order to identify and match the specimen, label, and order to the patient. While the labeling process is not new, it is frequently laden with errors (Brown, Smith, & Sherfy, 2011). Wrong blood in tube (WBIT) poses significant risk. Multiple factors contribute to mislabeling errors, including lax policies, limited technological solutions, decentralized labeling processes, multi-tasking, distraction from the clinician, and insufficient education and training of staff. To reduce blood specimen labeling errors, a large academic medical center implemented an innovative technological solution for specimen labeling that integrates patient identification, physician order, and laboratory specimen identification through barcode technology that interfaces with the electronic medical record at the point of care. A failure mode, effects and critical analysis (FMECA) were completed to assess for system failure points, and to design workflow prior to training staff. Four failure points were identified and eliminated through workflow adjustments with the new system. Staff training utilizing simulation highlighted system safety points. This quality improvement process applied across adult and pediatric acute and critical care units provided dramatic reductions in blood specimen labeling errors pre/post intervention
    • …
    corecore