205 research outputs found

    Text detection and recognition in natural scene images

    Get PDF
    This thesis addresses the problem of end-to-end text detection and recognition in natural scene images based on deep neural networks. Scene text detection and recognition aim to find regions in an image that are considered as text by human beings, generate a bounding box for each word and output a corresponding sequence of characters. As a useful task in image analysis, scene text detection and recognition attract much attention in computer vision field. In this thesis, we tackle this problem by taking advantage of the success in deep learning techniques. Car license plates can be viewed as a spacial case of scene text, as they both consist of characters and appear in natural scenes. Nevertheless, they have their respective specificities. During the research progress, we start from car license plate detection and recognition. Then we extend the methods to general scene text, with additional ideas proposed. For both tasks, we develop two approaches respectively: a stepwise one and an integrated one. Stepwise methods tackle text detection and recognition step by step by respective models; while integrated methods handle both text detection and recognition simultaneously via one model. All approaches are based on the powerful deep Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), considering the tremendous breakthroughs they brought into the computer vision community. To begin with, a stepwise framework is proposed to tackle text detection and recognition, with its application to car license plates and general scene text respectively. A character CNN classifier is well trained to detect characters from an image in a sliding window manner. The detected characters are then grouped together as license plates or text lines according to some heuristic rules. A sequence labeling based method is proposed to recognize the whole license plate or text line without character level segmentation. On the basis of the sequence labeling based recognition method, to accelerate the processing speed, an integrated deep neural network is then proposed to address car license plate detection and recognition concurrently. It integrates both CNNs and RNNs in one network, and can be trained end-to-end. Both car license plate bounding boxes and their labels are generated in a single forward evaluation of the network. The whole process involves no heuristic rule, and avoids intermediate procedures like image cropping or feature recalculation, which not only prevents error accumulation, but also reduces computation burden. Lastly, the unified network is extended to simultaneous general text detection and recognition in natural scene. In contrast to the one for car license plates, some innovations are proposed to accommodate the special characteristics of general text. A varying-size RoI encoding method is proposed to handle the various aspect ratios of general text. An attention-based sequence-to-sequence learning structure is adopted for word recognition. It is expected that a character-level language model can be learnt in this manner. The whole framework can be trained end-to-end, requiring only images, the ground-truth bounding boxes and text labels. Through end-to-end training, the learned features can be more discriminative, which improves the overall performance. The convolutional features are calculated only once and shared by both detection and recognition, which saves the processing time. The proposed method has achieved state-of-the-art performance on several standard benchmark datasets.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Deep Learning Techniques for Multi-Dimensional Medical Image Analysis

    Get PDF

    Deep Learning Techniques for Multi-Dimensional Medical Image Analysis

    Get PDF

    Wearable Sensor Gait Analysis for Fall Detection Using Deep Learning Methods

    Get PDF
    World Health Organization (WHO) data show that around 684,000 people die from falls yearly, making it the second-highest mortality rate after traffic accidents [1]. Early detection of falls, followed by pneumatic protection, is one of the most effective means of ensuring the safety of the elderly. In light of the recent widespread adoption of wearable sensors, it has become increasingly critical that fall detection models are developed that can effectively process large and sequential sensor signal data. Several researchers have recently developed fall detection algorithms based on wearable sensor data. However, real-time fall detection remains challenging because of the wide range of gait variations in older. Choosing the appropriate sensor and placing it in the most suitable location are essential components of a robust real-time fall detection system. This dissertation implements various detection models to analyze and mitigate injuries due to falls in the senior community. It presents different methods for detecting falls in real-time using deep learning networks. Several sliding window segmentation techniques are developed and compared in the first study. As a next step, various methods are implemented and applied to prevent sampling imbalances caused by the real-world collection of fall data. A study is also conducted to determine whether accelerometers and gyroscopes can distinguish between falls and near-falls. According to the literature survey, machine learning algorithms produce varying degrees of accuracy when applied to various datasets. The algorithm’s performance depends on several factors, including the type and location of the sensors, the fall pattern, the dataset’s characteristics, and the methods used for preprocessing and sliding window segmentation. Other challenges associated with fall detection include the need for centralized datasets for comparing the results of different algorithms. This dissertation compares the performance of varying fall detection methods using deep learning algorithms across multiple data sets. Furthermore, deep learning has been explored in the second application of the ECG-based virtual pathology stethoscope detection system. A novel real-time virtual pathology stethoscope (VPS) detection method has been developed. Several deep-learning methods are evaluated for classifying the location of the stethoscope by taking advantage of subtle differences in the ECG signals. This study would significantly extend the simulation capabilities of standard patients by allowing medical students and trainees to perform realistic cardiac auscultation and hear cardiac auscultation in a clinical environment

    Deep Learning for Learning Representation and Its Application to Natural Language Processing

    Get PDF
    As the web evolves even faster than expected, the exponential growth of data becomes overwhelming. Textual data is being generated at an ever-increasing pace via emails, documents on the web, tweets, online user reviews, blogs, and so on. As the amount of unstructured text data grows, so does the need for intelligently processing and understanding it. The focus of this dissertation is on developing learning models that automatically induce representations of human language to solve higher level language tasks. In contrast to most conventional learning techniques, which employ certain shallow-structured learning architectures, deep learning is a newly developed machine learning technique which uses supervised and/or unsupervised strategies to automatically learn hierarchical representations in deep architectures and has been employed in varied tasks such as classification or regression. Deep learning was inspired by biological observations on human brain mechanisms for processing natural signals and has attracted the tremendous attention of both academia and industry in recent years due to its state-of-the-art performance in many research domains such as computer vision, speech recognition, and natural language processing. This dissertation focuses on how to represent the unstructured text data and how to model it with deep learning models in different natural language processing viii applications such as sequence tagging, sentiment analysis, semantic similarity and etc. Specifically, my dissertation addresses the following research topics: In Chapter 3, we examine one of the fundamental problems in NLP, text classification, by leveraging contextual information [MLX18a]; In Chapter 4, we propose a unified framework for generating an informative map from review corpus [MLX18b]; Chapter 5 discusses the tagging address queries in map search [Mok18]. This research was performed in collaboration with Microsoft; and In Chapter 6, we discuss an ongoing research work in the neural language sentence matching problem. We are working on extending this work to a recommendation system

    Uncertainty, interpretability and dataset limitations in Deep Learning

    Full text link
    [eng] Deep Learning (DL) has gained traction in the last years thanks to the exponential increase in compute power. New techniques and methods are published at a daily basis, and records are being set across multiple disciplines. Undeniably, DL has brought a revolution to the machine learning field and to our lives. However, not everything has been resolved and some considerations must be taken into account. For instance, obtaining uncertainty measures and bounds is still an open problem. Models should be able to capture and express the confidence they have in their decisions, and Artificial Neural Networks (ANN) are known to lack in this regard. Be it through out of distribution samples, adversarial attacks, or simply unrelated or nonsensical inputs, ANN models demonstrate an unfounded and incorrect tendency to still output high probabilities. Likewise, interpretability remains an unresolved question. Some fields not only need but rely on being able to provide human interpretations of the thought process of models. ANNs, and specially deep models trained with DL, are hard to reason about. Last but not least, there is a tendency that indicates that models are getting deeper and more complex. At the same time, to cope with the increasing number of parameters, datasets are required to be of higher quality and, usually, larger. Not all research, and even less real world applications, can keep with the increasing demands. Therefore, taking into account the previous issues, the main aim of this thesis is to provide methods and frameworks to tackle each of them. These approaches should be applicable to any suitable field and dataset, and are employed with real world datasets as proof of concept. First, we propose a method that provides interpretability with respect to the results through uncertainty measures. The model in question is capable of reasoning about the uncertainty inherent in data and leverages that information to progressively refine its outputs. In particular, the method is applied to land cover segmentation, a classification task that aims to assign a type of land to each pixel in satellite images. The dataset and application serve to prove that the final uncertainty bound enables the end-user to reason about the possible errors in the segmentation result. Second, Recurrent Neural Networks are used as a method to create robust models towards lacking datasets, both in terms of size and class balance. We apply them to two different fields, road extraction in satellite images and Wireless Capsule Endoscopy (WCE). The former demonstrates that contextual information in the temporal axis of data can be used to create models that achieve comparable results to state-of-the-art while being less complex. The latter, in turn, proves that contextual information for polyp detection can be crucial to obtain models that generalize better and obtain higher performance. Last, we propose two methods to leverage unlabeled data in the model creation process. Often datasets are easier to obtain than to label, which results in many wasted opportunities with traditional classification approaches. Our approaches based on self-supervised learning result in a novel contrastive loss that is capable of extracting meaningful information out of pseudo-labeled data. Applying both methods to WCE data proves that the extracted inherent knowledge creates models that perform better in extremely unbalanced datasets and with lack of data. To summarize, this thesis demonstrates potential solutions to obtain uncertainty bounds, provide reasonable explanations of the outputs, and to combat lack of data or unbalanced datasets. Overall, the presented methods have a positive impact on the DL field and could have a real and tangible effect for the society.[cat] És innegable que el Deep Learning ha causat una revolució en molts aspectes no solament de l’aprenentatge automàtic però també de les nostres vides diàries. Tot i així, encara queden aspectes a millorar. Les xarxes neuronals tenen problemes per estimar la seva confiança en les prediccions, i sovint reporten probabilitats altes en casos que no tenen relació amb el model o que directament no tenen sentit. De la mateixa forma, interpretar els resultats d’un model profund i complex resulta una tasca extremadament complicada. Aquests mateixos models, cada cop amb més paràmetres i més potents, requereixen també de dades més ben etiquetades i més completes. Tenint en compte aquestes limitacions, l’objectiu principal és el de buscar mètodes i algoritmes per trobar-ne solució. Primerament, es proposa la creació d’un mètode capaç d’obtenir incertesa en imatges satèl·lit i d’utilitzar-la per crear models més robustos i resultats interpretables. En segon lloc, s’utilitzen Recurrent Neural Networks (RNN) per combatre la falta de dades mitjançant l’obtenció d’informació contextual de dades temporals. Aquestes s’apliquen per l’extracció de carreteres d’imatges satèl·lit i per la classificació de pòlips en imatges obtingudes amb Wireless Capsule Endoscopy (WCE). Finalment, es plantegen dos mètodes per tractar amb la falta de dades etiquetades i desbalancejos en les classes amb l’ús de Self-supervised Learning (SSL). Seqüències no etiquetades d’imatges d’intestins s’incorporen en el models en una fase prèvia a la classificació tradicional. Aquesta tesi demostra que les solucions proposades per obtenir mesures d’incertesa són efectives per donar explicacions raonables i interpretables sobre els resultats. Igualment, es prova que el context en dades de caràcter temporal, obtingut amb RNNs, serveix per obtenir models més simples que poden arribar a solucionar els problemes derivats de la falta de dades. Per últim, es mostra que SSL serveix per combatre de forma efectiva els problemes de generalització degut a dades no balancejades en diversos dominis de WCE. Concloem que aquesta tesi presenta mètodes amb un impacte real en diversos aspectes de DL a la vegada que demostra la capacitat de tenir un impacte positiu en la societat
    • …
    corecore