246 research outputs found

    Real-Time Work Zone Traffic Management via Unmanned Air Vehicles

    Get PDF
    Highway work zones are prone to traffic accidents when congestion and queues develop. Vehicle queues expand at a rate of 1 mile every 2 minutes. Back-of-queue, rear-end crashes are the most common work zone crash, endangering the safety of motorists, passengers, and construction workers. The dynamic nature of queuing in the proximity of highway work zones necessitates traffic management solutions that can monitor and intervene in real time. Fortunately, recent progress in sensor technology, embedded systems, and wireless communication coupled to lower costs are now enabling the development of real-time, automated, “intelligent” traffic management systems that address this problem. The goal of this project was to perform preliminary research and proof of concept development work for the use of UAS in realtime traffic monitoring of highway construction zones in order to create real-time alerts for motorists, construction workers, and first responders. The main tasks of the proposed system was to collect traffic data via the UAV camera, analyze that a UAV based highway construction zone monitoring systems would be capable of detecting congestion and back-of-queue information, and alerting motorists of stopped traffic conditions, delay times, and alternate route options. Experiments were conducted using UAS to monitor traffic and collect traffic videos for processing. Prototype software was created to analyze this data. The software was successful in detecting vehicle speed from zero mph to highway speeds. Review of available mobile traffic apps were conducted for future integration with advanced iterations of the UAV and software system that has been created by this research. This project has proven that UAS monitoring of highway construction zones and real-time alerts to motorists, construction crews, and first responders is possible in the near term and future research is needed to further development and implement the innovative UAS traffic monitoring system developed by this research

    Automatic Detection and Rectification of Paper Receipts on Smartphones

    Full text link
    We describe the development of a real-time smartphone app that allows the user to digitize paper receipts in a novel way by "waving" their phone over the receipts and letting the app automatically detect and rectify the receipts for subsequent text recognition. We show that traditional computer vision algorithms for edge and corner detection do not robustly detect the non-linear and discontinuous edges and corners of a typical paper receipt in real-world settings. This is particularly the case when the colors of the receipt and background are similar, or where other interfering rectangular objects are present. Inaccurate detection of a receipt's corner positions then results in distorted images when using an affine projective transformation to rectify the perspective. We propose an innovative solution to receipt corner detection by treating each of the four corners as a unique "object", and training a Single Shot Detection MobileNet object detection model. We use a small amount of real data and a large amount of automatically generated synthetic data that is designed to be similar to real-world imaging scenarios. We show that our proposed method robustly detects the four corners of a receipt, giving a receipt detection accuracy of 85.3% on real-world data, compared to only 36.9% with a traditional edge detection-based approach. Our method works even when the color of the receipt is virtually indistinguishable from the background. Moreover, our method is trained to detect only the corners of the central target receipt and implicitly learns to ignore other receipts, and other rectangular objects. Including synthetic data allows us to train an even better model. These factors are a major advantage over traditional edge detection-based approaches, allowing us to deliver a much better experience to the user

    Um modelo para suporte automatizado ao reconhecimento, extração, personalização e reconstrução de gráficos estáticos

    Get PDF
    Data charts are widely used in our daily lives, being present in regular media, such as newspapers, magazines, web pages, books, and many others. A well constructed data chart leads to an intuitive understanding of its underlying data and in the same way, when data charts have wrong design choices, a redesign of these representations might be needed. However, in most cases, these charts are shown as a static image, which means that the original data are not usually available. Therefore, automatic methods could be applied to extract the underlying data from the chart images to allow these changes. The task of recognizing charts and extracting data from them is complex, largely due to the variety of chart types and their visual characteristics. Computer Vision techniques for image classification and object detection are widely used for the problem of recognizing charts, but only in images without any disturbance. Other features in real-world images that can make this task difficult are not present in most literature works, like photo distortions, noise, alignment, etc. Two computer vision techniques that can assist this task and have been little explored in this context are perspective detection and correction. These methods transform a distorted and noisy chart in a clear chart, with its type ready for data extraction or other uses. The task of reconstructing data is straightforward, as long the data is available the visualization can be reconstructed, but the scenario of reconstructing it on the same context is complex. Using a Visualization Grammar for this scenario is a key component, as these grammars usually have extensions for interaction, chart layers, and multiple views without requiring extra development effort. This work presents a model for automated support for custom recognition, and reconstruction of charts in images. The model automatically performs the process steps, such as reverse engineering, turning a static chart back into its data table for later reconstruction, while allowing the user to make modifications in case of uncertainties. This work also features a model-based architecture along with prototypes for various use cases. Validation is performed step by step, with methods inspired by the literature. This work features three use cases providing proof of concept and validation of the model. The first use case features usage of chart recognition methods focused on documents in the real-world, the second use case focus on vocalization of charts, using a visualization grammar to reconstruct a chart in audio format, and the third use case presents an Augmented Reality application that recognizes and reconstructs charts in the same context (a piece of paper) overlaying the new chart and interaction widgets. The results showed that with slight changes, chart recognition and reconstruction methods are now ready for real-world charts, when taking time, accuracy and precision into consideration.Os gráficos de dados são amplamente utilizados na nossa vida diária, estando presentes nos meios de comunicação regulares, tais como jornais, revistas, páginas web, livros, e muitos outros. Um gráfico bem construído leva a uma compreensão intuitiva dos seus dados inerentes e da mesma forma, quando os gráficos de dados têm escolhas de conceção erradas, poderá ser necessário um redesenho destas representações. Contudo, na maioria dos casos, estes gráficos são mostrados como uma imagem estática, o que significa que os dados originais não estão normalmente disponíveis. Portanto, poderiam ser aplicados métodos automáticos para extrair os dados inerentes das imagens dos gráficos, a fim de permitir estas alterações. A tarefa de reconhecer os gráficos e extrair dados dos mesmos é complexa, em grande parte devido à variedade de tipos de gráficos e às suas características visuais. As técnicas de Visão Computacional para classificação de imagens e deteção de objetos são amplamente utilizadas para o problema de reconhecimento de gráficos, mas apenas em imagens sem qualquer ruído. Outras características das imagens do mundo real que podem dificultar esta tarefa não estão presentes na maioria das obras literárias, como distorções fotográficas, ruído, alinhamento, etc. Duas técnicas de visão computacional que podem ajudar nesta tarefa e que têm sido pouco exploradas neste contexto são a deteção e correção da perspetiva. Estes métodos transformam um gráfico distorcido e ruidoso em um gráfico limpo, com o seu tipo pronto para extração de dados ou outras utilizações. A tarefa de reconstrução de dados é simples, desde que os dados estejam disponíveis a visualização pode ser reconstruída, mas o cenário de reconstrução no mesmo contexto é complexo. A utilização de uma Gramática de Visualização para este cenário é um componente chave, uma vez que estas gramáticas têm normalmente extensões para interação, camadas de gráficos, e visões múltiplas sem exigir um esforço extra de desenvolvimento. Este trabalho apresenta um modelo de suporte automatizado para o reconhecimento personalizado, e reconstrução de gráficos em imagens estáticas. O modelo executa automaticamente as etapas do processo, tais como engenharia inversa, transformando um gráfico estático novamente na sua tabela de dados para posterior reconstrução, ao mesmo tempo que permite ao utilizador fazer modificações em caso de incertezas. Este trabalho também apresenta uma arquitetura baseada em modelos, juntamente com protótipos para vários casos de utilização. A validação é efetuada passo a passo, com métodos inspirados na literatura. Este trabalho apresenta três casos de uso, fornecendo prova de conceito e validação do modelo. O primeiro caso de uso apresenta a utilização de métodos de reconhecimento de gráficos focando em documentos no mundo real, o segundo caso de uso centra-se na vocalização de gráficos, utilizando uma gramática de visualização para reconstruir um gráfico em formato áudio, e o terceiro caso de uso apresenta uma aplicação de Realidade Aumentada que reconhece e reconstrói gráficos no mesmo contexto (um pedaço de papel) sobrepondo os novos gráficos e widgets de interação. Os resultados mostraram que com pequenas alterações, os métodos de reconhecimento e reconstrução dos gráficos estão agora prontos para os gráficos do mundo real, tendo em consideração o tempo, a acurácia e a precisão.Programa Doutoral em Engenharia Informátic

    Extraction of textual information from image for information retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computer Vision for Scene Text Analaysis

    Get PDF
    The motivation of this dissertation is to develop a 'Seeing-Eye' video-based interface for the visually impaired to access environmental text information. We are concerned with those daily activities of the low-vision people involved with interpreting 'environmental text' or 'scene text' e.g., reading a newspaper, can labels and street signs. First, we discuss the devopement of such a video-based interface. In this interface, the processed image of a scene text is read by o®-the-shelf OCR and converted back to speech by Text-to-Speech(TTS) software. Our challenge is to feed a high quality image of a scene text for o®-the-shelf OCR software under general pose of the the surface on which text is printed. To achieve this, various problems related to feature detection, mosaicing, auto-focus, zoom, and systems integration were solved in the development of the system, and these are described. We employ the video-based interface for the analysis of video of lectures/posters. In this application, the text is assumed to be on a plane. It is necessary for automatic analysis of video content to add modules such as enhancement, text segmentation, preprocessing video content, metric rectification, etc. We provide qualitative results to justify the algorithm and system integration. For more general classes of surfaces that the text is printed on, such as bent or worked paper, we develop a novel method for 3D structure recovery and unwarping method. Deformed paper is isometric with a plane and the Gaussian curvature vanishes on every point on the surface. We show that these constraints lead to a closed set of equations that allow the recovery of the full geometric structure from a single image. We prove that these partial di®erential equations can be reduced to the Hopf equation that arises in non-linear wave propagation, and deformations of the paper can be interpreted in terms of the characteristics of this equation. A new exact integration of these equations relates the 3D structure of the surface to an image of a paper. In addition, we can generate such surfaces using the underlying equations. This method only uses information derived from the image of the boundary. Furthermore, we employ the shape-from-texture method as an alternative to the method above to infer its 3D structure. We showed that for the consistency of normal vector field, we need to add extra conditions based on the surface model. Such conditions are are isometry and zero Gaussian curvature of the surface. The theory underlying the method is novel and it raises new open research issues in the area of 3D reconstruction from single views. The novel contributions are: first, it is shown that certain linear and non-linear clues (contour knowledge information) are su±cient to recover the 3D structure of scene text; second, that with a priori of a page layout information, we can reconstruct a fronto-parallel view of a deformed page from di®erential geometric properties of a surface; third, that with a known cameral model we can recover 3D structure of a bent surface; forth, we present an integrated framework for analysis and rectification of scene texts from single views in general format; fifth, we provide the comparison with shape from texture approach and finally this work can be integrated as a visual prostheses for the visually impaired. Our work has many applications in computer vision and computer graphics. The applications are diverse e.g. a generalized scanning device, digital flattening of creased documents, 3D reconstruction problem when correspondence fails, 3D reconstruction of single old photos, bending and creasing virtual paper, object classification, semantic extraction, scene description and so on

    Gait analysis, modelling, and comparison from unconstrained walks and viewpoints : view-rectification of body-part trajectories from monocular video sequences

    Get PDF
    L'analyse, la modélisation et la comparaison de la démarche de personnes à l'aide d'algorithmes de vision artificielle a récemment suscité beaucoup d'intérêt dans les domaines d'applications médicales et de surveillance. Il y a en effet plusieurs avantages à utiliser des algorithmes de vision artificielle pour faire l'analyse, la modélisation et la comparaison de la démarche de personnes. Par exemple, la démarche d'une personne peut être analysée et modélisée de loin en observant la personne à l'aide d'une caméra, ce qui ne requiert pas le placement de marqueurs ou de senseurs sur la personne. De plus, la coopération des personnes observées n'est pas requise, ce qui permet d'utiliser la démarche des personnes comme un facteur d'identification biométrique dans les systèmes de surveillance automatique. Les méthodes d'analyse et de modélisation de la démarche existantes comportent toutefois plusieurs limitations. Plusieurs de ces méthodes nécessitent une vue de profil des personnes puisque ce point de vue est optimal pour l'analyse et la modélisation de la démarche. La plupart de ces méthodes supposent également une distance assez grande entre les personnes et la caméra afin de limiter les effets néfastes que la projection de perspective a sur l'analyse et la modélisation de la démarche. Par ailleurs, ces méthodes ne gèrent pas les changements de direction et de vitesse dans les marches. Cela limite grandement les marches pouvant être analysées et modélisées dans les applications médicales et les applications de surveillance. L'approche proposée dans cette thèse permet d'effectuer l'analyse, la modélisation et la comparaison de la démarche de personnes à partir de marches et de points de vue non contraints. L'approche proposée est principalement constituée d'une méthode de rectification du point de vue qui permet de générer une vue fronto-parallèle (vue de profil) de la trajectoire imagée des membres d'une personne. Cette méthode de rectification de la vue est basée sur un modèle de marche novateur qui utilise la géométrie projective pour faire les liens spatio-temporels entre la position des membres dans la scène et leur contrepartie dans les images provenant d'une caméra. La tête et les pieds sont les seuls membres nécessaires à l'approche proposée dans cette thèse. La position et le suivi de ces membres sont automatiquement effectués par un algorithme de suivi des membres développé dans le cadre de cette thèse. L'analyse de la démarche est effectuée par une nouvelle méthode qui extrait des caractéristiques de la démarche à partir de la trajectoire rectifiée des membres. Un nouveau modèle de la démarche basé sur la trajectoire rectifiée des membres est proposé afin de permettre la modélisation et la comparaison de la démarche en utilisant les caractéristiques dynamiques de la démarche. L'approche proposée dans cette thèse est premièrement validée à l'aide de marches synthétiques comprenant plusieurs points de vue différents ainsi que des changements de direction. Les résultats de cette étape de validation montrent que la méthode de rectification de la vue fonctionne correctement, et qu'il est possible d'extraire des caractéristiques de la démarche valides à partir de la trajectoire rectifiée des membres. Par la suite, l'analyse, la modélisation et la comparaison de la démarche de personnes sont effectuées sur des marches réelles qui ont été acquises dans le cadre de cette thèse. Ces marches sont particulièrement difficiles à analyser et à modéliser puisqu'elles ont été effectuées près de la caméra et qu'elles comportent des changements de direction et de vitesse. Les résultats d'analyse de la démarche confirment que les caractéristiques de la démarche obtenues à l'aide de la méthode proposée sont réalistes et sont en accord avec les résultats présentés dans les études cliniques de la démarche. Les résultats de modélisation et de comparaison de la démarche démontrent qu'il est possible d'utiliser la méthode proposée pour reconnaître des personnes par leur démarche dans le contexte des applications de surveillance. Les taux de reconnaissance obtenus sont bons considérant la complexité des marches utilisées dans cette thèse.Gait analysis, modelling and comparison using computer vision algorithms has recently attracted much attention for medical and surveillance applications. Analyzing and modelling a person's gait with computer vision algorithms has indeed some interesting advantages over more traditional biometrics. For instance, gait can be analyzed and modelled at a distance by observing the person with a camera, which means that no markers or sensors have to be worn by the person. Moreover, gait analysis and modelling using computer vision algorithms does not require the cooperation of the observed people, which thus allows for using gait as a biometric in surveillance applications. Current gait analysis and modelling approaches have however severe limitations. For instance, several approaches require a side view of the walks since this viewpoint is optimal for gait analysis and modelling. Most approaches also require the walks to be observed far enough from the camera in order to avoid perspective distortion effects that would badly affect the resulting gait analyses and models. Moreover, current approaches do not allow for changes in walk direction and in walking speed, which greatly constraints the walks that can be analyzed and modelled in medical and surveillance applications. The approach proposed in this thesis aims at performing gait analysis, modelling and comparison from unconstrained walks and viewpoints in medical and surveillance applications. The proposed approach mainly consists in a novel view-rectification method that generates a fronto-parallel viewpoint (side view) of the imaged trajectories of body parts. The view-rectification method is based on a novel walk model that uses projective geometry to provide the spatio-temporal links between the body-part positions in the scene and their corresponding positions in the images. The head and the feet are the only body parts that are relevant for the proposed approach. They are automatically localized and tracked in monocular video sequences using a novel body parts tracking algorithm. Gait analysis is performed by a novel method that extracts standard gait measurements from the view-rectified body-part trajectories. A novel gait model based on body-part trajectories is also proposed in order to perform gait modelling and comparison using the dynamics of the gait. The proposed approach is first validated using synthetic walks comprising different viewpoints and changes in the walk direction. The validation results shows that the proposed view-rectification method works well, that is, valid gait measurements can be extracted from the view-rectified body-part trajectories. Next, gait analysis, modelling, and comparison is performed on real walks acquired as part of this thesis. These walks are challenging since they were performed close to the camera and contain changes in walk direction and in walking speed. The results first show that the obtained gait measurements are realistic and correspond to the gait measurements found in references on clinical gait analysis. The gait comparison results then show that the proposed approach can be used to perform gait modelling and comparison in the context of surveillance applications by recognizing people by their gait. The computed recognition rates are quite good considering the challenging walks used in this thesis

    Fotofacesua: sistema de gestão fotográfica da Universidade de Aveiro

    Get PDF
    Nowadays, automation is present in basically every computational system. With the raise of Machine Learning algorithms through the years, the necessity of a human being to intervene in a system has dropped a lot. Although, in Universities, Companies and even governmental Institutions there are some systems that are have not been automatized. One of these cases, is the profile photo management, that stills requires human intervention to check if the image follows the Institution set of criteria that are obligatory to submit a new photo. FotoFaces is a system for updating the profile photos of collaborators at the University of Aveiro that allows the collaborator to submit a new photo and, automatically, through a set of image processing algorithms, decide if the photo meets a set of predifined criteria. One of the main advantages of this system is that it can be used in any institution and can be adapted to different needs by just changing the algorithms or criteria considered. This Dissertation describes some improvements implemented in the existing system, as well as some new features in terms of the available algorithms. The main contributions to the system are the following: sunglasses detection, hat detection and background analysis. For the first two, it was necessary to create a new database and label it to train, validate and test a deep transfer learning network, used to detect sunglasses and hats. In addition, several tests were performed varying the parameters of the network and using some machine learning and pre-processing techniques on the input images. Finally, the background analysis consists of the implementation and testing of 2 existing algorithms in the literature, one low level and the other deep learning. Overall, the results obtained in the improvement of the existing algorithms, as well as the performance of the new image processing modules, allowed the creation of a more robust (improved production version algorithms) and versatile (addition of new algorithms to the system) profile photo update system.Atualmente, a automação está presente em basicamente todos os sistemas computacionais. Com o aumento dos algoritmos de Aprendizagem Máquina ao longo dos anos, a necessidade de um ser humano intervir num sistema caiu bastante. Embora, em Universidades, Empresas e até Instituições governamentais, existam alguns sistemas que não foram automatizados. Um desses casos, é a gestão de fotos de perfil, que requer intervenção humana para verificar se a imagem segue o conjunto de critérios da Instituição que são obrigatórios para a submissão de uma nova foto. O FotoFaces é um sistema de atualização de fotos do perfil dos colaboradores na Universidade de Aveiro que permite ao colaborador submeter uma nova foto e, automaticamente, através de um conjunto de algoritmos de processamnto de imagem, decidir se a foto cumpre um conjunto de critérios predefinidos. Uma das principais vantagens deste sistema é que pode ser utilizado em qualquer Instituição e pode ser adaptado às diferentes necessidades alterando apenas os algoritmos ou os critérios considerados. Esta Dissertação descreve algumas melhorias implementadas no sistema existente, bem como algumas funcionalidades novas ao nível dos algoritmos disponíveis. As principais contribuições para o sistema são as seguintes: detecção de óculos de sol, detecção de chapéus e análise de background. Para as duas primeiras, foi necessário criar uma nova base de dados e rotulá-la para treinar, validar e testar uma rede de aprendizagem profunda por transferência, utilizada para detectar os óculos de sol e chapéus. Além disso, foram feitos vários testes variando os parâmetros da rede e usando algumas técnicas de aprendizagem máquina e pré-processamento sobre as imagens de entrada. Por fim, a análise do fundo consiste na implementação e teste de 2 algoritmos existentes na literatura, um de baixo nível e outro de aprendizagem profunda. Globalmente, os resultados obtidos na melhoria dos algoritmos existentes, bem como o desempenho dos novos módulos de processamneto de imagem, permitiram criar um sistema de atualização de fotos do perfil mais robusto (melhoria dos algoritmos da versão de produção) e versátil (adição de novos algoritmos ao sistema).Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    A Web-Based Index of Historical Valuation Maps for the Erie Railroad

    Get PDF
    The purpose of this project was to develop a web based index of historical railroad valuation maps for the Erie Lackawanna Historical Society (ELHS). The ELHS was in possession of a complete collection of over 3600 scanned 1918 railroad valuation map for the Erie Railroad, however these scanned maps lacked spatial reference. The first step in making these maps usable in modern GIS was to georeference them using Esri’s ArcGIS. Once the maps were georeferenced, they were organized into a geodatabase, along with additional supporting layers, as well as geotagged historical photos relating to the railroads. In order to make these maps and data available to a wider audience, an interactive web application was developed using HTML, CSS, and Esri’s ArcGIS API for JavaScript, which allows users to view the georeferenced maps as a fully mosaicked map layer, or access the original maps and photographs individually or in bulk

    DocMIR: An automatic document-based indexing system for meeting retrieval

    Get PDF
    This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter's computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio-visual data and metadata created during post-production. The identification is based on documents' signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality system

    A Comparison of Change Detection Methods in an Urban Environment Using LANDSAT TM and ETM+ Satellite Imagery: A Multi-Temporal, Multi-Spectral Analysis of Gwinnett County, GA 1991-2000

    Get PDF
    Land cover change detection in urban areas provides valuable data on loss of forest and agricultural land to residential and commercial development. Using Landsat 5 Thematic Mapper (1991) and Landsat 7 ETM+ (2000) imagery of Gwinnett County, GA, change images were obtained using image differencing of Normalized Difference Vegetation Index (NDVI), principal components analysis (PCA), and Tasseled Cap-transformed images. Ground truthing and accuracy assessment determined that land cover change detection using the NDVI and Tasseled Cap image transformation methods performed best in the study area, while PCA performed the worst of the three methods assessed. Analyses on vegetative and vegetation changes from 1991- 2000 revealed that these methods perform well for detecting changes in vegetation and/or vegetative characteristics but do not always correspond with changes in land use. Gwinnett County lost an estimated 13,500 hectares of vegetation cover during the study period to urban sprawl, with the majority of the loss coming from forested areas
    corecore