1,051 research outputs found

    Bernoulli HMMs for Handwritten Text Recognition

    Full text link
    In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using discriminative training criteria, instead of the conventionalMaximum Likelihood Estimation (MLE). Specifically, we propose a log-linear classifier for binary data based on the BHMM classifier. Parameter estimation of this model can be carried out using discriminative training criteria for log-linear models. In particular, we show the formulae for several MMI based criteria. Finally, we prove the equivalence between both classifiers, hence, discriminative training of a BHMM classifier can be carried out by obtaining its equivalent log-linear classifier. Reported results show that discriminative BHMMs clearly outperform conventional generative BHMMs.Giménez Pastor, A. (2014). Bernoulli HMMs for Handwritten Text Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37978TESI

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Patterns of connectome variability in autism across five functional activation tasks: findings from the LEAP project

    Full text link
    Background: Autism spectrum disorder (autism) is a complex neurodevelopmental condition with pronounced behavioral, cognitive, and neural heterogeneities across individuals. Here, our goal was to characterize heterogeneity in autism by identifying patterns of neural diversity as reflected in BOLD fMRI in the way individuals with autism engage with a varied array of cognitive tasks. Methods: All analyses were based on the EU-AIMS/AIMS-2-TRIALS multisite Longitudinal European Autism Project (LEAP) with participants with autism (n = 282) and typically developing (TD) controls (n = 221) between 6 and 30 years of age. We employed a novel task potency approach which combines the unique aspects of both resting state fMRI and task-fMRI to quantify task-induced variations in the functional connectome. Normative modelling was used to map atypicality of features on an individual basis with respect to their distribution in neurotypical control participants. We applied robust out-of-sample canonical correlation analysis (CCA) to relate connectome data to behavioral data. Results: Deviation from the normative ranges of global functional connectivity was greater for individuals with autism compared to TD in each fMRI task paradigm (all tasks p < 0.001). The similarity across individuals of the deviation pattern was significantly increased in autistic relative to TD individuals (p < 0.002). The CCA identified significant and robust brain-behavior covariation between functional connectivity atypicality and autism-related behavioral features. Conclusions: Individuals with autism engage with tasks in a globally atypical way, but the particular spatial pattern of this atypicality is nevertheless similar across tasks. Atypicalities in the tasks originate mostly from prefrontal cortex and default mode network regions, but also speech and auditory networks. We show how sophisticated modeling methods such as task potency and normative modeling can be used toward unravelling complex heterogeneous conditions like autism

    Soft computing applied to optimization, computer vision and medicine

    Get PDF
    Artificial intelligence has permeated almost every area of life in modern society, and its significance continues to grow. As a result, in recent years, Soft Computing has emerged as a powerful set of methodologies that propose innovative and robust solutions to a variety of complex problems. Soft Computing methods, because of their broad range of application, have the potential to significantly improve human living conditions. The motivation for the present research emerged from this background and possibility. This research aims to accomplish two main objectives: On the one hand, it endeavors to bridge the gap between Soft Computing techniques and their application to intricate problems. On the other hand, it explores the hypothetical benefits of Soft Computing methodologies as novel effective tools for such problems. This thesis synthesizes the results of extensive research on Soft Computing methods and their applications to optimization, Computer Vision, and medicine. This work is composed of several individual projects, which employ classical and new optimization algorithms. The manuscript presented here intends to provide an overview of the different aspects of Soft Computing methods in order to enable the reader to reach a global understanding of the field. Therefore, this document is assembled as a monograph that summarizes the outcomes of these projects across 12 chapters. The chapters are structured so that they can be read independently. The key focus of this work is the application and design of Soft Computing approaches for solving problems in the following: Block Matching, Pattern Detection, Thresholding, Corner Detection, Template Matching, Circle Detection, Color Segmentation, Leukocyte Detection, and Breast Thermogram Analysis. One of the outcomes presented in this thesis involves the development of two evolutionary approaches for global optimization. These were tested over complex benchmark datasets and showed promising results, thus opening the debate for future applications. Moreover, the applications for Computer Vision and medicine presented in this work have highlighted the utility of different Soft Computing methodologies in the solution of problems in such subjects. A milestone in this area is the translation of the Computer Vision and medical issues into optimization problems. Additionally, this work also strives to provide tools for combating public health issues by expanding the concepts to automated detection and diagnosis aid for pathologies such as Leukemia and breast cancer. The application of Soft Computing techniques in this field has attracted great interest worldwide due to the exponential growth of these diseases. Lastly, the use of Fuzzy Logic, Artificial Neural Networks, and Expert Systems in many everyday domestic appliances, such as washing machines, cookers, and refrigerators is now a reality. Many other industrial and commercial applications of Soft Computing have also been integrated into everyday use, and this is expected to increase within the next decade. Therefore, the research conducted here contributes an important piece for expanding these developments. The applications presented in this work are intended to serve as technological tools that can then be used in the development of new devices

    Image Processing and Simulation Toolboxes of Microscopy Images of Bacterial Cells

    Get PDF
    Recent advances in microscopy imaging technology have allowed the characterization of the dynamics of cellular processes at the single-cell and single-molecule level. Particularly in bacterial cell studies, and using the E. coli as a case study, these techniques have been used to detect and track internal cell structures such as the Nucleoid and the Cell Wall and fluorescently tagged molecular aggregates such as FtsZ proteins, Min system proteins, inclusion bodies and all the different types of RNA molecules. These studies have been performed with using multi-modal, multi-process, time-lapse microscopy, producing both morphological and functional images. To facilitate the finding of relationships between cellular processes, from small-scale, such as gene expression, to large-scale, such as cell division, an image processing toolbox was implemented with several automatic and/or manual features such as, cell segmentation and tracking, intra-modal and intra-modal image registration, as well as the detection, counting and characterization of several cellular components. Two segmentation algorithms of cellular component were implemented, the first one based on the Gaussian Distribution and the second based on Thresholding and morphological structuring functions. These algorithms were used to perform the segmentation of Nucleoids and to identify the different stages of FtsZ Ring formation (allied with the use of machine learning algorithms), which allowed to understand how the temperature influences the physical properties of the Nucleoid and correlated those properties with the exclusion of protein aggregates from the center of the cell. Another study used the segmentation algorithms to study how the temperature affects the formation of the FtsZ Ring. The validation of the developed image processing methods and techniques has been based on benchmark databases manually produced and curated by experts. When dealing with thousands of cells and hundreds of images, these manually generated datasets can become the biggest cost in a research project. To expedite these studies in terms of time and lower the cost of the manual labour, an image simulation was implemented to generate realistic artificial images. The proposed image simulation toolbox can generate biologically inspired objects that mimic the spatial and temporal organization of bacterial cells and their processes, such as cell growth and division and cell motility, and cell morphology (shape, size and cluster organization). The image simulation toolbox was shown to be useful in the validation of three cell tracking algorithms: Simple Nearest-Neighbour, Nearest-Neighbour with Morphology and DBSCAN cluster identification algorithm. It was shown that the Simple Nearest-Neighbour still performed with great reliability when simulating objects with small velocities, while the other algorithms performed better for higher velocities and when there were larger clusters present

    Neural Networks for Document Image and Text Processing

    Full text link
    Nowadays, the main libraries and document archives are investing a considerable effort on digitizing their collections. Indeed, most of them are scanning the documents and publishing the resulting images without their corresponding transcriptions. This seriously limits the document exploitation possibilities. When the transcription is necessary, it is manually performed by human experts, which is a very expensive and error-prone task. Obtaining transcriptions to the level of required quality demands the intervention of human experts to review and correct the resulting output of the recognition engines. To this end, it is extremely useful to provide interactive tools to obtain and edit the transcription. Although text recognition is the final goal, several previous steps (known as preprocessing) are necessary in order to get a fine transcription from a digitized image. Document cleaning, enhancement, and binarization (if they are needed) are the first stages of the recognition pipeline. Historical Handwritten Documents, in addition, show several degradations, stains, ink-trough and other artifacts. Therefore, more sophisticated and elaborate methods are required when dealing with these kind of documents, even expert supervision in some cases is needed. Once images have been cleaned, main zones of the image have to be detected: those that contain text and other parts such as images, decorations, versal letters. Moreover, the relations among them and the final text have to be detected. Those preprocessing steps are critical for the final performance of the system since an error at this point will be propagated during the rest of the transcription process. The ultimate goal of the Document Image Analysis pipeline is to receive the transcription of the text (Optical Character Recognition and Handwritten Text Recognition). During this thesis we aimed to improve the main stages of the recognition pipeline, from the scanned documents as input to the final transcription. We focused our effort on applying Neural Networks and deep learning techniques directly on the document images to extract suitable features that will be used by the different tasks dealt during the following work: Image Cleaning and Enhancement (Document Image Binarization), Layout Extraction, Text Line Extraction, Text Line Normalization and finally decoding (or text line recognition). As one can see, the following work focuses on small improvements through the several Document Image Analysis stages, but also deals with some of the real challenges: historical manuscripts and documents without clear layouts or very degraded documents. Neural Networks are a central topic for the whole work collected in this document. Different convolutional models have been applied for document image cleaning and enhancement. Connectionist models have been used, as well, for text line extraction: first, for detecting interest points and combining them in text segments and, finally, extracting the lines by means of aggregation techniques; and second, for pixel labeling to extract the main body area of the text and then the limits of the lines. For text line preprocessing, i.e., to normalize the text lines before recognizing them, similar models have been used to detect the main body area and then to height-normalize the images giving more importance to the central area of the text. Finally, Convolutional Neural Networks and deep multilayer perceptrons have been combined with hidden Markov models to improve our transcription engine significantly. The suitability of all these approaches has been tested with different corpora for any of the stages dealt, giving competitive results for most of the methodologies presented.Hoy en día, las principales librerías y archivos está invirtiendo un esfuerzo considerable en la digitalización de sus colecciones. De hecho, la mayoría están escaneando estos documentos y publicando únicamente las imágenes sin transcripciones, limitando seriamente la posibilidad de explotar estos documentos. Cuando la transcripción es necesaria, esta se realiza normalmente por expertos de forma manual, lo cual es una tarea costosa y propensa a errores. Si se utilizan sistemas de reconocimiento automático se necesita la intervención de expertos humanos para revisar y corregir la salida de estos motores de reconocimiento. Por ello, es extremadamente útil para proporcionar herramientas interactivas con el fin de generar y corregir la transcripciones. Aunque el reconocimiento de texto es el objetivo final del Análisis de Documentos, varios pasos previos (preprocesamiento) son necesarios para conseguir una buena transcripción a partir de una imagen digitalizada. La limpieza, mejora y binarización de las imágenes son las primeras etapas del proceso de reconocimiento. Además, los manuscritos históricos tienen una mayor dificultad en el preprocesamiento, puesto que pueden mostrar varios tipos de degradaciones, manchas, tinta a través del papel y demás dificultades. Por lo tanto, este tipo de documentos requiere métodos de preprocesamiento más sofisticados. En algunos casos, incluso, se precisa de la supervisión de expertos para garantizar buenos resultados en esta etapa. Una vez que las imágenes han sido limpiadas, las diferentes zonas de la imagen deben de ser localizadas: texto, gráficos, dibujos, decoraciones, letras versales, etc. Por otra parte, también es importante conocer las relaciones entre estas entidades. Estas etapas del pre-procesamiento son críticas para el rendimiento final del sistema, ya que los errores cometidos en aquí se propagarán al resto del proceso de transcripción. El objetivo principal del trabajo presentado en este documento es mejorar las principales etapas del proceso de reconocimiento completo: desde las imágenes escaneadas hasta la transcripción final. Nuestros esfuerzos se centran en aplicar técnicas de Redes Neuronales (ANNs) y aprendizaje profundo directamente sobre las imágenes de los documentos, con la intención de extraer características adecuadas para las diferentes tareas: Limpieza y Mejora de Documentos, Extracción de Líneas, Normalización de Líneas de Texto y, finalmente, transcripción del texto. Como se puede apreciar, el trabajo se centra en pequeñas mejoras en diferentes etapas del Análisis y Procesamiento de Documentos, pero también trata de abordar tareas más complejas: manuscritos históricos, o documentos que presentan degradaciones. Las ANNs y el aprendizaje profundo son uno de los temas centrales de esta tesis. Diferentes modelos neuronales convolucionales se han desarrollado para la limpieza y mejora de imágenes de documentos. También se han utilizado modelos conexionistas para la extracción de líneas: primero, para detectar puntos de interés y segmentos de texto y, agregarlos para extraer las líneas del documento; y en segundo lugar, etiquetando directamente los píxeles de la imagen para extraer la zona central del texto y así definir los límites de las líneas. Para el preproceso de las líneas de texto, es decir, la normalización del texto antes del reconocimiento final, se han utilizado modelos similares a los mencionados para detectar la zona central del texto. Las imagenes se rescalan a una altura fija dando más importancia a esta zona central. Por último, en cuanto a reconocimiento de escritura manuscrita, se han combinado técnicas de ANNs y aprendizaje profundo con Modelos Ocultos de Markov, mejorando significativamente los resultados obtenidos previamente por nuestro motor de reconocimiento. La idoneidad de todos estos enfoques han sido testeados con diferentes corpus en cada una de las tareas tratadas., obtenieAvui en dia, les principals llibreries i arxius històrics estan invertint un esforç considerable en la digitalització de les seues col·leccions de documents. De fet, la majoria estan escanejant aquests documents i publicant únicament les imatges sense les seues transcripcions, fet que limita seriosament la possibilitat d'explotació d'aquests documents. Quan la transcripció del text és necessària, normalment aquesta és realitzada per experts de forma manual, la qual cosa és una tasca costosa i pot provocar errors. Si s'utilitzen sistemes de reconeixement automàtic es necessita la intervenció d'experts humans per a revisar i corregir l'eixida d'aquests motors de reconeixement. Per aquest motiu, és extremadament útil proporcionar eines interactives amb la finalitat de generar i corregir les transcripcions generades pels motors de reconeixement. Tot i que el reconeixement del text és l'objectiu final de l'Anàlisi de Documents, diversos passos previs (coneguts com preprocessament) són necessaris per a l'obtenció de transcripcions acurades a partir d'imatges digitalitzades. La neteja, millora i binarització de les imatges (si calen) són les primeres etapes prèvies al reconeixement. A més a més, els manuscrits històrics presenten una major dificultat d'analisi i preprocessament, perquè poden mostrar diversos tipus de degradacions, taques, tinta a través del paper i altres peculiaritats. Per tant, aquest tipus de documents requereixen mètodes de preprocessament més sofisticats. En alguns casos, fins i tot, es precisa de la supervisió d'experts per a garantir bons resultats en aquesta etapa. Una vegada que les imatges han sigut netejades, les diferents zones de la imatge han de ser localitzades: text, gràfics, dibuixos, decoracions, versals, etc. D'altra banda, també és important conéixer les relacions entre aquestes entitats i el text que contenen. Aquestes etapes del preprocessament són crítiques per al rendiment final del sistema, ja que els errors comesos en aquest moment es propagaran a la resta del procés de transcripció. L'objectiu principal del treball que estem presentant és millorar les principals etapes del procés de reconeixement, és a dir, des de les imatges escanejades fins a l'obtenció final de la transcripció del text. Els nostres esforços se centren en aplicar tècniques de Xarxes Neuronals (ANNs) i aprenentatge profund directament sobre les imatges de documents, amb la intenció d'extraure característiques adequades per a les diferents tasques analitzades: neteja i millora de documents, extracció de línies, normalització de línies de text i, finalment, transcripció. Com es pot apreciar, el treball realitzat aplica xicotetes millores en diferents etapes de l'Anàlisi de Documents, però també tracta d'abordar tasques més complexes: manuscrits històrics, o documents que presenten degradacions. Les ANNs i l'aprenentatge profund són un dels temes centrals d'aquesta tesi. Diferents models neuronals convolucionals s'han desenvolupat per a la neteja i millora de les dels documents. També s'han utilitzat models connexionistes per a la tasca d'extracció de línies: primer, per a detectar punts d'interés i segments de text i, agregar-los per a extraure les línies del document; i en segon lloc, etiquetant directament els pixels de la imatge per a extraure la zona central del text i així definir els límits de les línies. Per al preprocés de les línies de text, és a dir, la normalització del text abans del reconeixement final, s'han utilitzat models similars als utilitzats per a l'extracció de línies. Finalment, quant al reconeixement d'escriptura manuscrita, s'han combinat tècniques de ANNs i aprenentatge profund amb Models Ocults de Markov, que han millorat significativament els resultats obtinguts prèviament pel nostre motor de reconeixement. La idoneïtat de tots aquests enfocaments han sigut testejats amb diferents corpus en cadascuna de les tasques tractadPastor Pellicer, J. (2017). Neural Networks for Document Image and Text Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90443TESI
    corecore