113 research outputs found

    Adaptive Window Selection for Non-uniform Lighting Image Thresholding

    Get PDF
    Selection of appropriate size of windows or subimages is the most important step for thresholding images with non-uniform lighting. In this paper, a novel criteria function is developed to partition images into different size of sub images appropriate for thresholding. After the partitioning, each subimage is segmented by Otsu's thresholding approaches. The performance of the proposed method is validated on benchmark test images with different degree of uneven lighting. Based on the qualitative and quantitative measures, the proposed method is fully automatic, fast and efficient in comparison to many landmark approaches

    Arabic Text Recognition and Machine Translation

    Full text link
    [EN] Research on Arabic Handwritten Text Recognition (HTR) and Arabic-English Machine Translation (MT) has been usually approached as two independent areas of study. However, the idea of creating one system that combines both areas together, in order to generate English translation out of images containing Arabic text, is still a very challenging task. This process can be interpreted as the translation of Arabic images. In this thesis, we propose a system that recognizes Arabic handwritten text images, and translates the recognized text into English. This system is built from the combination of an HTR system and an MT system. Regarding the HTR system, our work focuses on the use of Bernoulli Hidden Markov Models (BHMMs). BHMMs had proven to work very well with Latin script. Indeed, empirical results based on it were reported on well-known corpora, such as IAM and RIMES. In this thesis, these results are extended to Arabic script, in particular, to the well-known IfN/ENIT and NIST OpenHaRT databases for Arabic handwritten text. The need for transcribing Arabic text is not only limited to handwritten text, but also to printed text. Arabic printed text might be considered as a simple form of handwritten text version. Thus, for this kind of text, we also propose Bernoulli HMMs. In addition, we propose to compare BHMMs with state-of-the-art technology based on neural networks. A key idea that has proven to be very effective in this application of Bernoulli HMMs is the use of a sliding window of adequate width for feature extraction. This idea has allowed us to obtain very competitive results in the recognition of both Arabic handwriting and printed text. Indeed, a system based on it ranked first at the ICDAR 2011 Arabic recognition competition on the Arabic Printed Text Image (APTI) database. Moreover, this idea has been refined by using repositioning techniques for extracted windows, leading to further improvements in Arabic text recognition. In the case of handwritten text, this refinement improved our system which ranked first at the ICFHR 2010 Arabic handwriting recognition competition on IfN/ENIT. In the case of printed text, this refinement led to an improved system which ranked second at the ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text on APTI. Furthermore, this refinement was used with neural networks-based technology, which led to state-of-the-art results. For machine translation, the system was based on the combination of three state-of-the-art statistical models: the standard phrase-based models, the hierarchical phrase-based models, and the N-gram phrase-based models. This combination was done using the Recognizer Output Voting Error Reduction (ROVER) method. Finally, we propose three methods of combining HTR and MT to develop an Arabic image translation system. The system was evaluated on the NIST OpenHaRT database, where competitive results were obtained.[ES] El reconocimiento de texto manuscrito (HTR) en árabe y la traducción automática (MT) del árabe al inglés se han tratado habitualmente como dos áreas de estudio independientes. De hecho, la idea de crear un sistema que combine las dos áreas, que directamente genere texto en inglés a partir de imágenes que contienen texto en árabe, sigue siendo una tarea difícil. Este proceso se puede interpretar como la traducción de imágenes de texto en árabe. En esta tesis, se propone un sistema que reconoce las imágenes de texto manuscrito en árabe, y que traduce el texto reconocido al inglés. Este sistema está construido a partir de la combinación de un sistema HTR y un sistema MT. En cuanto al sistema HTR, nuestro trabajo se enfoca en el uso de los Bernoulli Hidden Markov Models (BHMMs). Los modelos BHMMs ya han sido probados anteriormente en tareas con alfabeto latino obteniendo buenos resultados. De hecho, existen resultados empíricos publicados usando corpus conocidos, tales como IAM o RIMES. En esta tesis, estos resultados se han extendido al texto manuscrito en árabe, en particular, a las bases de datos IfN/ENIT y NIST OpenHaRT. En aplicaciones reales, la transcripción del texto en árabe no se limita únicamente al texto manuscrito, sino también al texto impreso. El texto impreso se puede interpretar como una forma simplificada de texto manuscrito. Por lo tanto, para este tipo de texto, también proponemos el uso de modelos BHMMs. Además, estos modelos se han comparado con tecnología del estado del arte basada en redes neuronales. Una idea clave que ha demostrado ser muy eficaz en la aplicación de modelos BHMMs es el uso de una ventana deslizante (sliding window) de anchura adecuada durante la extracción de características. Esta idea ha permitido obtener resultados muy competitivos tanto en el reconocimiento de texto manuscrito en árabe como en el de texto impreso. De hecho, un sistema basado en este tipo de extracción de características quedó en la primera posición en el concurso ICDAR 2011 Arabic recognition competition usando la base de datos Arabic Printed Text Image (APTI). Además, esta idea se ha perfeccionado mediante el uso de técnicas de reposicionamiento aplicadas a las ventanas extraídas, dando lugar a nuevas mejoras en el reconocimiento de texto árabe. En el caso de texto manuscrito, este refinamiento ha conseguido mejorar el sistema que ocupó el primer lugar en el concurso ICFHR 2010 Arabic handwriting recognition competition usando IfN/ENIT. En el caso del texto impreso, este refinamiento condujo a un sistema mejor que ocupó el segundo lugar en el concurso ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text en el que se usaba APTI. Por otro lado, esta técnica se ha evaluado también en tecnología basada en redes neuronales, lo que ha llevado a resultados del estado del arte. Respecto a la traducción automática, el sistema se ha basado en la combinación de tres tipos de modelos estadísticos del estado del arte: los modelos standard phrase-based, los modelos hierarchical phrase-based y los modelos N-gram phrase-based. Esta combinación se hizo utilizando el método Recognizer Output Voting Error Reduction (ROVER). Por último, se han propuesto tres métodos para combinar los sistemas HTR y MT con el fin de desarrollar un sistema de traducción de imágenes de texto árabe a inglés. El sistema se ha evaluado sobre la base de datos NIST OpenHaRT, donde se han obtenido resultados competitivos.[CA] El reconeixement de text manuscrit (HTR) en àrab i la traducció automàtica (MT) de l'àrab a l'anglès s'han tractat habitualment com dues àrees d'estudi independents. De fet, la idea de crear un sistema que combine les dues àrees, que directament genere text en anglès a partir d'imatges que contenen text en àrab, continua sent una tasca difícil. Aquest procés es pot interpretar com la traducció d'imatges de text en àrab. En aquesta tesi, es proposa un sistema que reconeix les imatges de text manuscrit en àrab, i que tradueix el text reconegut a l'anglès. Aquest sistema està construït a partir de la combinació d'un sistema HTR i d'un sistema MT. Pel que fa al sistema HTR, el nostre treball s'enfoca en l'ús dels Bernoulli Hidden Markov Models (BHMMs). Els models BHMMs ja han estat provats anteriorment en tasques amb alfabet llatí obtenint bons resultats. De fet, existeixen resultats empírics publicats emprant corpus coneguts, tals com IAM o RIMES. En aquesta tesi, aquests resultats s'han estès a la escriptura manuscrita en àrab, en particular, a les bases de dades IfN/ENIT i NIST OpenHaRT. En aplicacions reals, la transcripció de text en àrab no es limita únicament al text manuscrit, sinó també al text imprès. El text imprès es pot interpretar com una forma simplificada de text manuscrit. Per tant, per a aquest tipus de text, també proposem l'ús de models BHMMs. A més a més, aquests models s'han comparat amb tecnologia de l'estat de l'art basada en xarxes neuronals. Una idea clau que ha demostrat ser molt eficaç en l'aplicació de models BHMMs és l'ús d'una finestra lliscant (sliding window) d'amplària adequada durant l'extracció de característiques. Aquesta idea ha permès obtenir resultats molt competitius tant en el reconeixement de text àrab manuscrit com en el de text imprès. De fet, un sistema basat en aquest tipus d'extracció de característiques va quedar en primera posició en el concurs ICDAR 2011 Arabic recognition competition emprant la base de dades Arabic Printed Text Image (APTI). A més a més, aquesta idea s'ha perfeccionat mitjançant l'ús de tècniques de reposicionament aplicades a les finestres extretes, donant lloc a noves millores en el reconeixement de text en àrab. En el cas de text manuscrit, aquest refinament ha aconseguit millorar el sistema que va ocupar el primer lloc en el concurs ICFHR 2010 Arabic handwriting recognition competition usant IfN/ENIT. En el cas del text imprès, aquest refinament va conduir a un sistema millor que va ocupar el segon lloc en el concurs ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text en el qual s'usava APTI. D'altra banda, aquesta tècnica s'ha avaluat també en tecnologia basada en xarxes neuronals, el que ha portat a resultats de l'estat de l'art. Respecte a la traducció automàtica, el sistema s'ha basat en la combinació de tres tipus de models estadístics de l'estat de l'art: els models standard phrase-based, els models hierarchical phrase-based i els models N-gram phrase-based. Aquesta combinació es va fer utilitzant el mètode Recognizer Output Voting Errada Reduction (ROVER). Finalment, s'han proposat tres mètodes per combinar els sistemes HTR i MT amb la finalitat de desenvolupar un sistema de traducció d'imatges de text àrab a anglès. El sistema s'ha avaluat sobre la base de dades NIST OpenHaRT, on s'han obtingut resultats competitius.Alkhoury, I. (2015). Arabic Text Recognition and Machine Translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53029TESI

    On motion in dynamic magnetic resonance imaging: Applications in cardiac function and abdominal diffusion

    Get PDF
    La imagen por resonancia magnética (MRI), hoy en día, representa una potente herramienta para el diagnóstico clínico debido a su flexibilidad y sensibilidad a un amplio rango de propiedades del tejido. Sus principales ventajas son su sobresaliente versatilidad y su capacidad para proporcionar alto contraste entre tejidos blandos. Gracias a esa versatilidad, la MRI se puede emplear para observar diferentes fenómenos físicos dentro del cuerpo humano combinando distintos tipos de pulsos dentro de la secuencia. Esto ha permitido crear distintas modalidades con múltiples aplicaciones tanto biológicas como clínicas. La adquisición de MR es, sin embargo, un proceso lento, lo que conlleva una solución de compromiso entre resolución y tiempo de adquisición (Lima da Cruz, 2016; Royuela-del Val, 2017). Debido a esto, la presencia de movimiento fisiológico durante la adquisición puede conllevar una grave degradación de la calidad de imagen, así como un incremento del tiempo de adquisición, aumentando así tambien la incomodidad del paciente. Esta limitación práctica representa un gran obstáculo para la viabilidad clínica de la MRI. En esta Tesis Doctoral se abordan dos problemas de interés en el campo de la MRI en los que el movimiento fisiológico tiene un papel protagonista. Éstos son, por un lado, la estimación robusta de parámetros de rotación y esfuerzo miocárdico a partir de imágenes de MR-Tagging dinámica para el diagnóstico y clasificación de cardiomiopatías y, por otro, la reconstrucción de mapas del coeficiente de difusión aparente (ADC) a alta resolución y con alta relación señal a ruido (SNR) a partir de adquisiciones de imagen ponderada en difusión (DWI) multiparamétrica en el hígado.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    A Drift-Resilient and Degeneracy-Aware Loop Closure Detection Method for Localization and Mapping In Perceptually-Degraded Environments

    Get PDF
    Enabling fully autonomous robots capable of navigating and exploring unknown and complex environments has been at the core of robotics research for several decades. Mobile robots rely on a model of the environment for functions like manipulation, collision avoidance and path planning. In GPS-denied and unknown environments where a prior map of the environment is not available, robots need to rely on the onboard sensing to obtain locally accurate maps to operate in their local environment. A global map of an unknown environment can be constructed from fusion of local maps of temporally or spatially distributed mobile robots in the environment. Loop closure detection, the ability to assert that a robot has returned to a previously visited location, is crucial for consistent mapping as it reduces the drift caused by error accumulation in the estimated robot trajectory. Moreover, in multi-robot systems, loop closure detection enables finding the correspondences between the local maps obtained by individual robots and merging them into a consistent global map of the environment. In ambiguous and perceptually-degraded environments, robust detection of intra- and inter-robot loop closures is especially challenging. This is due to poor illumination or lack-thereof, self-similarity, and sparsity of distinctive perceptual landmarks and features sufficient for establishing global position. Overcoming these challenges enables a wide range of terrestrial and planetary applications, ranging from search and rescue, and disaster relief in hostile environments, to robotic exploration of lunar and Martian surfaces, caves and lava tubes that are of particular interest as they can provide potential habitats for future manned space missions. In this dissertation, methods and metrics are developed for resolving location ambiguities to significantly improve loop closures in perceptually-degraded environments with sparse or undifferentiated features. The first contribution of this dissertation is development of a degeneracy-aware SLAM front-end capable of determining the level of geometric degeneracy in an unknown environment based on computing the Hessian associated with the computed optimal transformation from lidar scan matching. Using this crucial capability, featureless areas that could lead to data association ambiguity and spurious loop closures are determined and excluded from the search for loop closures. This significantly improves the quality and accuracy of localization and mapping, because the search space for loop closures can be expanded as needed to account for drift while decreasing rather than increasing the probability of false loop closure detections. The second contribution of this dissertation is development of a drift-resilient loop closure detection method that relies on the 2D semantic and 3D geometric features extracted from lidar point cloud data to enable detection of loop closures with increased robustness and accuracy as compared to traditional geometric methods. The proposed method achieves higher performance by exploiting the spatial configuration of the local scenes embedded in 2D occupancy grid maps commonly used in robot navigation, to search for putative loop closures in a pre-matching step before using a geometric verification. The third contribution of this dissertation is an extensive evaluation and analysis of performance and comparison with the state-of-the-art methods in simulation and in real-world, including six challenging underground mines across the United States

    Mobility increases localizability: A survey on wireless indoor localization using inertial sensors

    Get PDF
    Wireless indoor positioning has been extensively studied for the past 2 decades and continuously attracted growing research efforts in mobile computing context. As the integration of multiple inertial sensors (e.g., accelerometer, gyroscope, and magnetometer) to nowadays smartphones in recent years, human-centric mobility sensing is emerging and coming into vogue. Mobility information, as a new dimension in addition to wireless signals, can benefit localization in a number of ways, since location and mobility are by nature related in the physical world. In this article, we survey this new trend of mobility enhancing smartphone-based indoor localization. Specifically, we first study how to measure human mobility: what types of sensors we can use and what types of mobility information we can acquire. Next, we discuss how mobility assists localization with respect to enhancing location accuracy, decreasing deployment cost, and enriching location context. Moreover, considering the quality and cost of smartphone built-in sensors, handling measurement errors is essential and accordingly investigated. Combining existing work and our own working experiences, we emphasize the principles and conduct comparative study of the mainstream technologies. Finally, we conclude this survey by addressing future research directions and opportunities in this new and largely open area.</jats:p

    Time-encoded pseudo-continuous arterial spin labeling: Increasing SNR in ASL dynamic angiography

    Get PDF
    Purpose: Dynamic angiography using arterial spin labeling (ASL) can provide detailed hemodynamic information. However, the long time-resolved readouts require small flip angles to preserve ASL signal for later timepoints, limiting SNR. By using time-encoded ASL to generate temporal information, the readout can be shortened. Here, the SNR improvements from using larger flip angles, made possible by the shorter readout, are quantitatively investigated. Methods: The SNR of a conventional protocol with nine Look-Locker readouts and a 4 (Formula presented.) 3 time-encoded protocol with three Look-Locker readouts (giving nine matched timepoints) were compared using simulations and in vivo data. Both protocols were compared using readouts with constant flip angles (CFAs) and variable flip angles (VFAs), where the VFA scheme was designed to produce a consistent ASL signal across readouts. Optimization of the background suppression to minimize physiological noise across readouts was also explored. Results: The time-encoded protocol increased in vivo SNR by 103% and 96% when using CFAs or VFAs, respectively. Use of VFAs improved SNR compared with CFAs by 25% and 21% for the conventional and time-encoded protocols, respectively. The VFA scheme also removed signal discontinuities in the time-encoded data. Preliminary data suggest that optimizing the background suppression could improve in vivo SNR by a further 16%. Conclusions: Time encoding can be used to generate additional temporal information in ASL angiography. This enables the use of larger flip angles, which can double the SNR compared with a non-time-encoded protocol. The shortened time-encoded readout can also lead to improved background suppression, reducing physiological noise and further improving SNR

    Time-encoded pseudo-continuous arterial spin labeling: Increasing SNR in ASL dynamic angiography

    Get PDF
    Purpose: Dynamic angiography using arterial spin labeling (ASL) can provide detailed hemodynamic information. However, the long time-resolved readouts require small flip angles to preserve ASL signal for later timepoints, limiting SNR. By using time-encoded ASL to generate temporal information, the readout can be shortened. Here, the SNR improvements from using larger flip angles, made possible by the shorter readout, are quantitatively investigated. Methods: The SNR of a conventional protocol with nine Look-Locker readouts and a 4 (Formula presented.) 3 time-encoded protocol with three Look-Locker readouts (giving nine matched timepoints) were compared using simulations and in vivo data. Both protocols were compared using readouts with constant flip angles (CFAs) and variable flip angles (VFAs), where the VFA scheme was designed to produce a consistent ASL signal across readouts. Optimization of the background suppression to minimize physiological noise across readouts was also explored. Results: The time-encoded protocol increased in vivo SNR by 103% and 96% when using CFAs or VFAs, respectively. Use of VFAs improved SNR compared with CFAs by 25% and 21% for the conventional and time-encoded protocols, respectively. The VFA scheme also removed signal discontinuities in the time-encoded data. Preliminary data suggest that optimizing the background suppression could improve in vivo SNR by a further 16%. Conclusions: Time encoding can be used to generate additional temporal information in ASL angiography. This enables the use of larger flip angles, which can double the SNR compared with a non-time-encoded protocol. The shortened time-encoded readout can also lead to improved background suppression, reducing physiological noise and further improving SNR

    A theoretical eye model for uncalibrated real-time eye gaze estimation

    Get PDF
    Computer vision systems that monitor human activity can be utilized for many diverse applications. Some general applications stemming from such activity monitoring are surveillance, human-computer interfaces, aids for the handicapped, and virtual reality environments. For most of these applications, a non-intrusive system is desirable, either for reasons of covertness or comfort. Also desirable is generality across users, especially for humancomputer interfaces and surveillance. This thesis presents a method of gaze estimation that, without calibration, determines a relatively unconstrained user’s overall horizontal eye gaze. Utilizing anthropometric data and physiological models, a simple, yet general eye model is presented. The equations that describe the gaze angle of the eye in this model are presented. The procedure for choosing the proper features for gaze estimation is detailed and the algorithms utilized to find these points are described. Results from manual and automatic feature extraction are presented and analyzed. The error observed from this model is around 3± and the error observed from the implementation is around 6±. This amount of error is comparable to previous eye gaze estimation algorithms and it validates this model. The results presented across a set of subjects display consistency, which proves the generality of this model. A real-time implementation that operates around 17 frames per second displays the efficiency of the algorithms implemented. While there are many interesting directions for future work, the goals of this thesis were achieved

    Mathematische morfologie in de beeldverwerking Mathematical Morphology in Image Processing

    Get PDF
    Het verwerken van een afbeelding met de computer laat ons toe de kwaliteit van dit beeld te verbeteren, specifieke objecten uit het beeld te segmenteren, of extra informatie tevoorschijn te halen. Mathematische morfologie is een set van wiskundige technieken uit de beeldverwerking die ons toelaat (de vormen in) beelden te analyseren. Dit proefschrift levert oplossingen voor een aantal problemen uit de beeldverwerking, met behulp van mathematische morfologie. Morfologie toepassen op zwart-wit- of grijswaardenbeelden is relatief eenvoudig, maar de theorie uitbreiden voor kleurbeelden stelt een aantal problemen. Aangezien een kleurbeeld veel meer nuttige informatie kan bevatten dan een grijswaardenbeeld, is zo'n uitbreiding wenselijk. We stellen het meerderheidsordeningsschema (MSS) voor, wat ons toelaat kleuren onderling te ordenen op een logische manier. Morfologische beeldverwerking met kleuren wordt dan mogelijk. Een ander onderzoek betreft polymeren en composieten. Deze materialen worden als glijlagers gebruikt in allerhande voorwerpen, zoals huishoudtoestellen, sluizen, poorten, etc. Vandaar dat de studie van de slijtage hiervan belangrijk is. We gaan na of het morfologische patroonspectrum, alsook vergelijkbare technieken, een bijdrage kan leveren aan het wrijvingsonderzoek van dergelijke materialen. Dit zou de snelheid en efficiëntie van de analyses kunnen verbeteren. We merken op dat de spectrale parameters interessante verbanden vertonen met de parameters van de proefopstelling. Het derde luik van de thesis betreft het ontwikkelen van een interpolatietechniek voor zwart-wit-beelden, gebaseerd op mathematische morfologie, genaamd mmINT. Interpolatie is nodig wanneer we wensen in te zoomen op een beeld of de resolutie van het beeld willen vergroten. Dit kan van pas komen wanneer we ingescande of gedownloade tekeningen van slechte kwaliteit (te lage resolutie) willen verbeteren. mmINT werkt aanzienlijk beter dan bestaande methodes. We ontwikkelden ook een snelle variant, mmINTone, en een uitbreiding voor grijswaardenbeelden, mmINTg
    corecore