3,250 research outputs found

    ImageNet Large Scale Visual Recognition Challenge

    Get PDF
    The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

    Multimodal Machine Learning for Automated ICD Coding

    Full text link
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.Comment: Machine Learning for Healthcare 201

    Vision Based Extraction of Nutrition Information from Skewed Nutrition Labels

    Get PDF
    An important component of a healthy diet is the comprehension and retention of nutritional information and understanding of how different food items and nutritional constituents affect our bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutrition labels (NLs) which can be found in all packaged food products. However, sometimes it becomes really challenging to utilize all this information available in these NLs even for consumers who are health conscious as they might not be familiar with nutritional terms or find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. So it is essential to automate this data collection and interpretation process by integrating Computer Vision based algorithms to extract nutritional information from NLs because it improves the user’s ability to engage in continuous nutritional data collection and analysis. To make nutritional data collection more manageable and enjoyable for the users, we present a Proactive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information processing, and context-sensitive nutrition decision support. PNUTS consists of two modules, firstly a barcode scanning module which runs on smart phones and is capable of vision-based localization of One Dimensional (1D) Universal Product Code (UPC) and International Article Number (EAN) barcodes with relaxed pitch, roll, and yaw camera alignment constraints. The algorithm localizes barcodes in images by computing Dominant Orientations of Gradients (DOGs) of image segments and grouping smaller segments with similar DOGs into larger connected components. Connected components that pass given morphological criteria are marked as potential barcodes. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android smartphones with Android 4.2 or higher. The system’s back end is deployed on a five node Linux cluster where images are processed. The algorithm was evaluated on a corpus of 7,545 images extracted from 506 videos of bags, bottles, boxes, and cans in a supermarket. The DOG algorithm was coupled to our in-place scanner for 1D UPC and EAN barcodes. The scanner receives from the DOG algorithm the rectangular planar dimensions of a connected component and the component’s dominant gradient orientation angle referred to as the skew angle. The scanner draws several scan lines at that skew angle within the component to recognize the barcode in place without any rotations. The scanner coupled to the localizer was tested on the same corpus of 7,545 images. Laboratory experiments indicate that the system can localize and scan barcodes of any orientation in the yaw plane, of up to 73.28 degrees in the pitch plane, and of up to 55.5 degrees in the roll plane. The videos have been made public for all interested research communities to replicate our findings or to use them in their own research. The front end Android application is available for free download at Google Play under the title of NutriGlass. This module is also coupled to a comprehensive NL database from which nutritional information can be retrieved on demand. Currently our NL database consists of more than 230,000 products. The second module of PNUTS is an algorithm whose objective is to determine the text skew angle of an NL image without constraining the angle’s magnitude. The horizontal, vertical, and diagonal matrices of the (Two Dimensional) 2D Haar Wavelet Transform are used to identify 2D points with significant intensity changes. The set of points is bounded with a minimum area rectangle whose rotation angle is the text’s skew. The algorithm’s performance is compared with the performance of five text skew detection algorithms on 1001 U.S. nutrition label images and 2200 single- and multi-column document images in multiple languages. To ensure the reproducibility of the reported results, the source code of the algorithm and the image data have been made publicly available. If the skew angle is estimated correctly, optical character recognition (OCR) techniques can be used to extract nutrition information

    An IoT based Virtual Coaching System (VSC) for Assisting Activities of Daily Life

    Get PDF
    Nowadays aging of the population is becoming one of the main concerns of theworld. It is estimated that the number of people aged over 65 will increase from 461million to 2 billion in 2050. This substantial increment in the elderly population willhave significant consequences in the social and health care system. Therefore, in thecontext of Ambient Intelligence (AmI), the Ambient Assisted Living (AAL) has beenemerging as a new research area to address problems related to the aging of the population. AAL technologies based on embedded devices have demonstrated to be effectivein alleviating the social- and health-care issues related to the continuous growing of theaverage age of the population. Many smart applications, devices and systems have beendeveloped to monitor the health status of elderly, substitute them in the accomplishment of activities of the daily life (especially in presence of some impairment or disability),alert their caregivers in case of necessity and help them in recognizing risky situations.Such assistive technologies basically rely on the communication and interaction be-tween body sensors, smart environments and smart devices. However, in such contextless effort has been spent in designing smart solutions for empowering and supportingthe self-efficacy of people with neurodegenerative diseases and elderly in general. Thisthesis fills in the gap by presenting a low-cost, non intrusive, and ubiquitous VirtualCoaching System (VCS) to support people in the acquisition of new behaviors (e.g.,taking pills, drinking water, finding the right key, avoiding motor blocks) necessary tocope with needs derived from a change in their health status and a degradation of theircognitive capabilities as they age. VCS is based on the concept of extended mind intro-duced by Clark and Chalmers in 1998. They proposed the idea that objects within theenvironment function as a part of the mind. In my revisiting of the concept of extendedmind, the VCS is composed of a set of smart objects that exploit the Internet of Things(IoT) technology and machine learning-based algorithms, in order to identify the needsof the users and react accordingly. In particular, the system exploits smart tags to trans-form objects commonly used by people (e.g., pillbox, bottle of water, keys) into smartobjects, it monitors their usage according to their needs, and it incrementally guidesthem in the acquisition of new behaviors related to their needs. To implement VCS, thisthesis explores different research directions and challenges. First of all, it addresses thedefinition of a ubiquitous, non-invasive and low-cost indoor monitoring architecture byexploiting the IoT paradigm. Secondly, it deals with the necessity of developing solu-tions for implementing coaching actions and consequently monitoring human activitiesby analyzing the interaction between people and smart objects. Finally, it focuses on the design of low-cost localization systems for indoor environment, since knowing theposition of a person provides VCS with essential information to acquire information onperformed activities and to prevent risky situations. In the end, the outcomes of theseresearch directions have been integrated into a healthcare application scenario to imple-ment a wearable system that prevents freezing of gait in people affected by Parkinson\u2019sDisease

    Use of the Smartphone Camera to Monitor Adherence to Inhaled Therapy

    Get PDF
    Self-management strategies can lead to improved health outcomes, fewer unscheduled treatments, and improved disease control. Compliance with inhaled control drugs is essential to achieve good clinical outcomes in patients with chronic respiratory diseases. However, compliance assessments suffer from the difficulty of achieving a high degree of trustworthiness, as patients often self-report high compliance rates and are considered unreliable. This thesis aims to enable reliable adhesion measurement by developing a mobile application module to objectively verify inhalation usage using image snapshots of the inhalation counter. To achieve this, a mobile application module featuring pre and post processing techniques and a default machine learning framework was built, for inhaler and dosage counter numbers detection. In addition, in an effort to improve the app’s capabilities of text recognition on a worst-performing inhaler, a machine learning model was trained on an inhaler image dataset. Some of the features worked on during this project were incorporated on the current version of the app InspirerMundi, a medication management mobile application, planned to be made available at the PlayStore by the end of 2021. The proposed approach was validated through a series of different inhaler image datasets. The carried-out tests with the default machine learning configuration showed correct detection of dosage counters for 70% of inhaler registration events and 93% for three commonly used inhalers in Portugal. On the other hand, the trained model had an average accuracy of 88 % in recognizing the digits on the dose counter of one of the worst-performing inhaler models. These results show the potential to explore mobile and embedded capabilities to gain additional evidence for inhaler compliance. These systems can help bridge the gap between patients and healthcare professionals. By empowering patients with disease selfmanagement and drug adherence tools and providing additional relevant data, these systems pave the way for informed disease management decisions

    Mobile Wound Assessment and 3D Modeling from a Single Image

    Get PDF
    The prevalence of camera-enabled mobile phones have made mobile wound assessment a viable treatment option for millions of previously difficult to reach patients. We have designed a complete mobile wound assessment platform to ameliorate the many challenges related to chronic wound care. Chronic wounds and infections are the most severe, costly and fatal types of wounds, placing them at the center of mobile wound assessment. Wound physicians assess thousands of single-view wound images from all over the world, and it may be difficult to determine the location of the wound on the body, for example, if the wound is taken at close range. In our solution, end-users capture an image of the wound by taking a picture with their mobile camera. The wound image is segmented and classified using modern convolution neural networks, and is stored securely in the cloud for remote tracking. We use an interactive semi-automated approach to allow users to specify the location of the wound on the body. To accomplish this we have created, to the best our knowledge, the first 3D human surface anatomy labeling system, based off the current NYU and Anatomy Mapper labeling systems. To interactively view wounds in 3D, we have presented an efficient projective texture mapping algorithm for texturing wounds onto a 3D human anatomy model. In so doing, we have demonstrated an approach to 3D wound reconstruction that works even for a single wound image

    DEVELOPING NOVEL COMPUTER-AIDED DETECTION AND DIAGNOSIS SYSTEMS OF MEDICAL IMAGES

    Get PDF
    Reading medical images to detect and diagnose diseases is often difficult and has large inter-reader variability. To address this issue, developing computer-aided detection and diagnosis (CAD) schemes or systems of medical images has attracted broad research interest in the last several decades. Despite great effort and significant progress in previous studies, only limited CAD schemes have been used in clinical practice. Thus, developing new CAD schemes is still a hot research topic in medical imaging informatics field. In this dissertation, I investigate the feasibility of developing several new innovative CAD schemes for different application purposes. First, to predict breast tumor response to neoadjuvant chemotherapy and reduce unnecessary aggressive surgery, I developed two CAD schemes of breast magnetic resonance imaging (MRI) to generate quantitative image markers based on quantitative analysis of global kinetic features. Using the image marker computed from breast MRI acquired pre-chemotherapy, CAD scheme enables to predict radiographic complete response (CR) of breast tumors to neoadjuvant chemotherapy, while using the imaging marker based on the fusion of kinetic and texture features extracted from breast MRI performed after neoadjuvant chemotherapy, CAD scheme can better predict the pathologic complete response (pCR) of the patients. Second, to more accurately predict prognosis of stroke patients, quantifying brain hemorrhage and ventricular cerebrospinal fluid depicting on brain CT images can play an important role. For this purpose, I developed a new interactive CAD tool to segment hemorrhage regions and extract radiological imaging marker to quantitatively determine the severity of aneurysmal subarachnoid hemorrhage at presentation and correlate the estimation with various homeostatic/metabolic derangements and predict clinical outcome. Third, to improve the efficiency of primary antibody screening processes in new cancer drug development, I developed a CAD scheme to automatically identify the non-negative tissue slides, which indicate reactive antibodies in digital pathology images. Last, to improve operation efficiency and reliability of storing digital pathology image data, I developed a CAD scheme using optical character recognition algorithm to automatically extract metadata from tissue slide label images and reduce manual entry for slide tracking and archiving in the tissue pathology laboratories. In summary, in these studies, we developed and tested several innovative approaches to identify quantitative imaging markers with high discriminatory power. In all CAD schemes, the graphic user interface-based visual aid tools were also developed and implemented. Study results demonstrated feasibility of applying CAD technology to several new application fields, which has potential to assist radiologists, oncologists and pathologists improving accuracy and consistency in disease diagnosis and prognosis assessment of using medical image

    A Review on the Applications of Crowdsourcing in Human Pathology

    Full text link
    The advent of the digital pathology has introduced new avenues of diagnostic medicine. Among them, crowdsourcing has attracted researchers' attention in the recent years, allowing them to engage thousands of untrained individuals in research and diagnosis. While there exist several articles in this regard, prior works have not collectively documented them. We, therefore, aim to review the applications of crowdsourcing in human pathology in a semi-systematic manner. We firstly, introduce a novel method to do a systematic search of the literature. Utilizing this method, we, then, collect hundreds of articles and screen them against a pre-defined set of criteria. Furthermore, we crowdsource part of the screening process, to examine another potential application of crowdsourcing. Finally, we review the selected articles and characterize the prior uses of crowdsourcing in pathology

    Imaging : making the invisible visible : proceedings of the symposium, 18 May 2000, Technische Universiteit Eindhoven

    Get PDF
    • …
    corecore