32 research outputs found

    Interactive reweighting for mitigating label quality issues

    Get PDF
    Label quality issues, such as noisy labels and imbalanced class distributions, have negative effects on model performance. Automatic reweighting methods identify problematic samples with label quality issues by recognizing their negative effects on validation samples and assigning lower weights to them. However, these methods fail to achieve satisfactory performance when the validation samples are of low quality. To tackle this, we develop Reweighter, a visual analysis tool for sample reweighting. The reweighting relationships between validation samples and training samples are modeled as a bipartite graph. Based on this graph, a validation sample improvement method is developed to improve the quality of validation samples. Since the automatic improvement may not always be perfect, a co-cluster-based bipartite graph visualization is developed to illustrate the reweighting relationships and support the interactive adjustments to validation samples and reweighting results. The adjustments are converted into the constraints of the validation sample improvement method to further improve validation samples. We demonstrate the effectiveness of Reweighter in improving reweighting results through quantitative evaluation and two case studies

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Systematic Approaches for Telemedicine and Data Coordination for COVID-19 in Baja California, Mexico

    Get PDF
    Conference proceedings info: ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologies Raleigh, HI, United States, March 24-26, 2023 Pages 529-542We provide a model for systematic implementation of telemedicine within a large evaluation center for COVID-19 in the area of Baja California, Mexico. Our model is based on human-centric design factors and cross disciplinary collaborations for scalable data-driven enablement of smartphone, cellular, and video Teleconsul-tation technologies to link hospitals, clinics, and emergency medical services for point-of-care assessments of COVID testing, and for subsequent treatment and quar-antine decisions. A multidisciplinary team was rapidly created, in cooperation with different institutions, including: the Autonomous University of Baja California, the Ministry of Health, the Command, Communication and Computer Control Center of the Ministry of the State of Baja California (C4), Colleges of Medicine, and the College of Psychologists. Our objective is to provide information to the public and to evaluate COVID-19 in real time and to track, regional, municipal, and state-wide data in real time that informs supply chains and resource allocation with the anticipation of a surge in COVID-19 cases. RESUMEN Proporcionamos un modelo para la implementación sistemática de la telemedicina dentro de un gran centro de evaluación de COVID-19 en el área de Baja California, México. Nuestro modelo se basa en factores de diseño centrados en el ser humano y colaboraciones interdisciplinarias para la habilitación escalable basada en datos de tecnologías de teleconsulta de teléfonos inteligentes, celulares y video para vincular hospitales, clínicas y servicios médicos de emergencia para evaluaciones de COVID en el punto de atención. pruebas, y para el tratamiento posterior y decisiones de cuarentena. Rápidamente se creó un equipo multidisciplinario, en cooperación con diferentes instituciones, entre ellas: la Universidad Autónoma de Baja California, la Secretaría de Salud, el Centro de Comando, Comunicaciones y Control Informático. de la Secretaría del Estado de Baja California (C4), Facultades de Medicina y Colegio de Psicólogos. Nuestro objetivo es proporcionar información al público y evaluar COVID-19 en tiempo real y rastrear datos regionales, municipales y estatales en tiempo real que informan las cadenas de suministro y la asignación de recursos con la anticipación de un aumento de COVID-19. 19 casos.ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologieshttps://doi.org/10.1007/978-981-99-3236-

    Data efficient deep learning for medical image analysis: A survey

    Full text link
    The rapid evolution of deep learning has significantly advanced the field of medical image analysis. However, despite these achievements, the further enhancement of deep learning models for medical image analysis faces a significant challenge due to the scarcity of large, well-annotated datasets. To address this issue, recent years have witnessed a growing emphasis on the development of data-efficient deep learning methods. This paper conducts a thorough review of data-efficient deep learning methods for medical image analysis. To this end, we categorize these methods based on the level of supervision they rely on, encompassing categories such as no supervision, inexact supervision, incomplete supervision, inaccurate supervision, and only limited supervision. We further divide these categories into finer subcategories. For example, we categorize inexact supervision into multiple instance learning and learning with weak annotations. Similarly, we categorize incomplete supervision into semi-supervised learning, active learning, and domain-adaptive learning and so on. Furthermore, we systematically summarize commonly used datasets for data efficient deep learning in medical image analysis and investigate future research directions to conclude this survey.Comment: Under Revie

    Active Learning from Knowledge-Rich Data

    Get PDF
    With the ever-increasing demand for the quality and quantity of the training samples, it is difficult to replicate the success of modern machine learning models in knowledge-rich domains, where the labeled data for training is scarce and labeling new data is expensive. While machine learning and AI have achieved significant progress in many common domains, the lack of large-scale labeled data samples poses a grand challenge for the wide application of advanced statistical learning models in key knowledge-rich domains, such as medicine, biology, physical science, and more. Active learning (AL) offers a promising and powerful learning paradigm that can significantly reduce the data-annotation stress by allowing the model to only sample the informative objects to learn from human experts. Previous AL models leverage simple criteria to explore the data space and achieve fast convergence of AL. However, those active sampling methods are less effective in exploring knowledge-rich data spaces and result in slow convergence of AL. In this thesis, we propose novel AL methods to address knowledge-rich data exploration challenges with respect to different types of machine learning tasks. Specifically, for multi-class tasks, we propose three approaches that leverage different types of sparse kernel machines to better capture the data covariance and use them to guide effective data exploration in a complex feature space. For multi-label tasks, it is essential to capture label correlations, and we model them in three different approaches to guide effective data exploration in a large and correlated label space. For data exploration in a very high-dimension feature space, we present novel uncertainty measures to better control the exploration behavior of deep learning models and leverage a uniquely designed regularizer to achieve effective exploration in high-dimension space. Our proposed models not only exhibit a good behavior of exploration for different types of knowledge-rich data but also manage to achieve an optimal exploration-exploitation balance with strong theoretical underpinnings. In the end, we study active learning in a more realistic scenario where human annotators provide noisy labels. We propose a re-sampling paradigm that leverages the machine\u27s awareness to reduce the noise rate. We theoretically prove the effectiveness of the re-sampling paradigm and design a novel spatial-temporal active re-sampling function by leveraging the critical spatial and temporal properties of the maximum-margin kernel classifiers

    Statistical Data Modeling and Machine Learning with Applications

    Get PDF
    The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data

    Semantic knowledge integration for learning from semantically imprecise data

    Get PDF
    Low availability of labeled training data often poses a fundamental limit to the accuracy of computer vision applications using machine learning methods. While these methods are improved continuously, e.g., through better neural network architectures, there cannot be a single methodical change that increases the accuracy on all possible tasks. This statement, known as the no free lunch theorem, suggests that we should consider aspects of machine learning other than learning algorithms for opportunities to escape the limits set by the available training data. In this thesis, we focus on two main aspects, namely the nature of the training data, where we introduce structure into the label set using concept hierarchies, and the learning paradigm, which we change in accordance with requirements of real-world applications as opposed to more academic setups.Concept hierarchies represent semantic relations, which are sets of statements such as "a bird is an animal." We propose a hierarchical classifier to integrate this domain knowledge in a pre-existing task, thereby increasing the information the classifier has access to. While the hierarchy's leaf nodes correspond to the original set of classes, the inner nodes are "new" concepts that do not exist in the original training data. However, we pose that such "imprecise" labels are valuable and should occur naturally, e.g., as an annotator's way of expressing their uncertainty. Furthermore, the increased number of concepts leads to more possible search terms when assembling a web-crawled dataset or using an image search. We propose CHILLAX, a method that learns from semantically imprecise training data, while still offering precise predictions to integrate seamlessly into a pre-existing application
    corecore