246 research outputs found
Real-Time Work Zone Traffic Management via Unmanned Air Vehicles
Highway work zones are prone to traffic accidents when congestion and queues develop. Vehicle queues expand at a rate of 1 mile every 2 minutes. Back-of-queue, rear-end crashes are the most common work zone crash, endangering the safety of motorists, passengers, and construction workers. The dynamic nature of queuing in the proximity of highway work zones necessitates traffic management solutions that can monitor and intervene in real time. Fortunately, recent progress in sensor technology, embedded systems, and wireless communication coupled to lower costs are now enabling the development of real-time, automated, “intelligent” traffic management systems that address this problem. The goal of this project was to perform preliminary research and proof of concept development work for the use of UAS in realtime traffic monitoring of highway construction zones in order to create real-time alerts for motorists, construction workers, and first responders. The main tasks of the proposed system was to collect traffic data via the UAV camera, analyze that a UAV based highway construction zone monitoring systems would be capable of detecting congestion and back-of-queue information, and alerting motorists of stopped traffic conditions, delay times, and alternate route options. Experiments were conducted using UAS to monitor traffic and collect traffic videos for processing. Prototype software was created to analyze this data. The software was successful in detecting vehicle speed from zero mph to highway speeds. Review of available mobile traffic apps were conducted for future integration with advanced iterations of the UAV and software system that has been created by this research. This project has proven that UAS monitoring of highway construction zones and real-time alerts to motorists, construction crews, and first responders is possible in the near term and future research is needed to further development and implement the innovative UAS traffic monitoring system developed by this research
Automatic Detection and Rectification of Paper Receipts on Smartphones
We describe the development of a real-time smartphone app that allows the
user to digitize paper receipts in a novel way by "waving" their phone over the
receipts and letting the app automatically detect and rectify the receipts for
subsequent text recognition.
We show that traditional computer vision algorithms for edge and corner
detection do not robustly detect the non-linear and discontinuous edges and
corners of a typical paper receipt in real-world settings. This is particularly
the case when the colors of the receipt and background are similar, or where
other interfering rectangular objects are present. Inaccurate detection of a
receipt's corner positions then results in distorted images when using an
affine projective transformation to rectify the perspective.
We propose an innovative solution to receipt corner detection by treating
each of the four corners as a unique "object", and training a Single Shot
Detection MobileNet object detection model. We use a small amount of real data
and a large amount of automatically generated synthetic data that is designed
to be similar to real-world imaging scenarios.
We show that our proposed method robustly detects the four corners of a
receipt, giving a receipt detection accuracy of 85.3% on real-world data,
compared to only 36.9% with a traditional edge detection-based approach. Our
method works even when the color of the receipt is virtually indistinguishable
from the background.
Moreover, our method is trained to detect only the corners of the central
target receipt and implicitly learns to ignore other receipts, and other
rectangular objects. Including synthetic data allows us to train an even better
model. These factors are a major advantage over traditional edge
detection-based approaches, allowing us to deliver a much better experience to
the user
Um modelo para suporte automatizado ao reconhecimento, extração, personalização e reconstrução de gráficos estáticos
Data charts are widely used in our daily lives, being present in regular media,
such as newspapers, magazines, web pages, books, and many others. A well constructed
data chart leads to an intuitive understanding of its underlying data
and in the same way, when data charts have wrong design choices, a redesign
of these representations might be needed. However, in most cases, these
charts are shown as a static image, which means that the original data are not
usually available. Therefore, automatic methods could be applied to extract the
underlying data from the chart images to allow these changes. The task of
recognizing charts and extracting data from them is complex, largely due to the
variety of chart types and their visual characteristics.
Computer Vision techniques for image classification and object detection are
widely used for the problem of recognizing charts, but only in images without
any disturbance. Other features in real-world images that can make this task
difficult are not present in most literature works, like photo distortions, noise,
alignment, etc. Two computer vision techniques that can assist this task and
have been little explored in this context are perspective detection and
correction. These methods transform a distorted and noisy chart in a clear
chart, with its type ready for data extraction or other uses. The task of
reconstructing data is straightforward, as long the data is available the
visualization can be reconstructed, but the scenario of reconstructing it on the
same context is complex.
Using a Visualization Grammar for this scenario is a key component, as these
grammars usually have extensions for interaction, chart layers, and multiple
views without requiring extra development effort.
This work presents a model for automated support for custom recognition, and
reconstruction of charts in images. The model automatically performs the
process steps, such as reverse engineering, turning a static chart back into its
data table for later reconstruction, while allowing the user to make modifications
in case of uncertainties. This work also features a model-based architecture
along with prototypes for various use cases. Validation is performed step by
step, with methods inspired by the literature. This work features three use
cases providing proof of concept and validation of the model.
The first use case features usage of chart recognition methods focused on
documents in the real-world, the second use case focus on vocalization of
charts, using a visualization grammar to reconstruct a chart in audio format,
and the third use case presents an Augmented Reality application that
recognizes and reconstructs charts in the same context (a piece of paper)
overlaying the new chart and interaction widgets. The results showed that with
slight changes, chart recognition and reconstruction methods are now ready for
real-world charts, when taking time, accuracy and precision into consideration.Os gráficos de dados são amplamente utilizados na nossa vida diária, estando
presentes nos meios de comunicação regulares, tais como jornais, revistas,
páginas web, livros, e muitos outros. Um gráfico bem construído leva a uma
compreensão intuitiva dos seus dados inerentes e da mesma forma, quando
os gráficos de dados têm escolhas de conceção erradas, poderá ser
necessário um redesenho destas representações. Contudo, na maioria dos
casos, estes gráficos são mostrados como uma imagem estática, o que
significa que os dados originais não estão normalmente disponíveis. Portanto,
poderiam ser aplicados métodos automáticos para extrair os dados inerentes
das imagens dos gráficos, a fim de permitir estas alterações. A tarefa de
reconhecer os gráficos e extrair dados dos mesmos é complexa, em grande
parte devido à variedade de tipos de gráficos e às suas características visuais.
As técnicas de Visão Computacional para classificação de imagens e deteção
de objetos são amplamente utilizadas para o problema de reconhecimento de
gráficos, mas apenas em imagens sem qualquer ruído. Outras características
das imagens do mundo real que podem dificultar esta tarefa não estão
presentes na maioria das obras literárias, como distorções fotográficas, ruído,
alinhamento, etc. Duas técnicas de visão computacional que podem ajudar
nesta tarefa e que têm sido pouco exploradas neste contexto são a deteção e
correção da perspetiva. Estes métodos transformam um gráfico distorcido e
ruidoso em um gráfico limpo, com o seu tipo pronto para extração de dados
ou outras utilizações. A tarefa de reconstrução de dados é simples, desde que
os dados estejam disponíveis a visualização pode ser reconstruída, mas o
cenário de reconstrução no mesmo contexto é complexo.
A utilização de uma Gramática de Visualização para este cenário é um
componente chave, uma vez que estas gramáticas têm normalmente
extensões para interação, camadas de gráficos, e visões múltiplas sem exigir
um esforço extra de desenvolvimento.
Este trabalho apresenta um modelo de suporte automatizado para o
reconhecimento personalizado, e reconstrução de gráficos em imagens
estáticas. O modelo executa automaticamente as etapas do processo, tais
como engenharia inversa, transformando um gráfico estático novamente na
sua tabela de dados para posterior reconstrução, ao mesmo tempo que
permite ao utilizador fazer modificações em caso de incertezas. Este trabalho
também apresenta uma arquitetura baseada em modelos, juntamente com
protótipos para vários casos de utilização. A validação é efetuada passo a
passo, com métodos inspirados na literatura. Este trabalho apresenta três
casos de uso, fornecendo prova de conceito e validação do modelo.
O primeiro caso de uso apresenta a utilização de métodos de reconhecimento
de gráficos focando em documentos no mundo real, o segundo caso de uso
centra-se na vocalização de gráficos, utilizando uma gramática de visualização
para reconstruir um gráfico em formato áudio, e o terceiro caso de uso
apresenta uma aplicação de Realidade Aumentada que reconhece e reconstrói
gráficos no mesmo contexto (um pedaço de papel) sobrepondo os novos
gráficos e widgets de interação. Os resultados mostraram que com pequenas
alterações, os métodos de reconhecimento e reconstrução dos gráficos estão
agora prontos para os gráficos do mundo real, tendo em consideração o
tempo, a acurácia e a precisão.Programa Doutoral em Engenharia Informátic
Extraction of textual information from image for information retrieval
Ph.DDOCTOR OF PHILOSOPH
Computer Vision for Scene Text Analaysis
The motivation of this dissertation is to develop a 'Seeing-Eye' video-based
interface for the visually impaired to access environmental text information. We
are concerned with those daily activities of the low-vision people involved with
interpreting 'environmental text' or 'scene text' e.g., reading a newspaper, can labels
and street signs.
First, we discuss the devopement of such a video-based interface. In this
interface, the processed image of a scene text is read by o®-the-shelf OCR and
converted back to speech by Text-to-Speech(TTS) software. Our challenge is to feed
a high quality image of a scene text for o®-the-shelf OCR software under general
pose of the the surface on which text is printed. To achieve this, various problems
related to feature detection, mosaicing, auto-focus, zoom, and systems integration
were solved in the development of the system, and these are described.
We employ the video-based interface for the analysis of video of lectures/posters.
In this application, the text is assumed to be on a plane. It is necessary for automatic
analysis of video content to add modules such as enhancement, text segmentation,
preprocessing video content, metric rectification, etc. We provide qualitative results
to justify the algorithm and system integration.
For more general classes of surfaces that the text is printed on, such as bent or
worked paper, we develop a novel method for 3D structure recovery and unwarping
method. Deformed paper is isometric with a plane and the Gaussian curvature
vanishes on every point on the surface. We show that these constraints lead to a
closed set of equations that allow the recovery of the full geometric structure from a
single image. We prove that these partial di®erential equations can be reduced to the
Hopf equation that arises in non-linear wave propagation, and deformations of the
paper can be interpreted in terms of the characteristics of this equation. A new exact
integration of these equations relates the 3D structure of the surface to an image of
a paper. In addition, we can generate such surfaces using the underlying equations.
This method only uses information derived from the image of the boundary.
Furthermore, we employ the shape-from-texture method as an alternative to
the method above to infer its 3D structure. We showed that for the consistency of
normal vector field, we need to add extra conditions based on the surface model.
Such conditions are are isometry and zero Gaussian curvature of the surface.
The theory underlying the method is novel and it raises new open research
issues in the area of 3D reconstruction from single views. The novel contributions
are: first, it is shown that certain linear and non-linear clues (contour knowledge
information) are su±cient to recover the 3D structure of scene text; second, that
with a priori of a page layout information, we can reconstruct a fronto-parallel view
of a deformed page from di®erential geometric properties of a surface; third, that
with a known cameral model we can recover 3D structure of a bent surface; forth, we
present an integrated framework for analysis and rectification of scene texts from
single views in general format; fifth, we provide the comparison with shape from
texture approach and finally this work can be integrated as a visual prostheses for
the visually impaired.
Our work has many applications in computer vision and computer graphics.
The applications are diverse e.g. a generalized scanning device, digital flattening
of creased documents, 3D reconstruction problem when correspondence fails, 3D
reconstruction of single old photos, bending and creasing virtual paper, object classification,
semantic extraction, scene description and so on
Gait analysis, modelling, and comparison from unconstrained walks and viewpoints : view-rectification of body-part trajectories from monocular video sequences
L'analyse, la modélisation et la comparaison de la démarche de personnes à l'aide d'algorithmes de vision artificielle a récemment suscité beaucoup d'intérêt dans les domaines d'applications médicales et de surveillance. Il y a en effet plusieurs avantages à utiliser des algorithmes de vision artificielle pour faire l'analyse, la modélisation et la comparaison de la démarche de personnes. Par exemple, la démarche d'une personne peut être analysée et modélisée de loin en observant la personne à l'aide d'une caméra, ce qui ne requiert pas le placement de marqueurs ou de senseurs sur la personne. De plus, la coopération des personnes observées n'est pas requise, ce qui permet d'utiliser la démarche des personnes comme un facteur d'identification biométrique dans les systèmes de surveillance automatique. Les méthodes d'analyse et de modélisation de la démarche existantes comportent toutefois plusieurs limitations. Plusieurs de ces méthodes nécessitent une vue de profil des personnes puisque ce point de vue est optimal pour l'analyse et la modélisation de la démarche. La plupart de ces méthodes supposent également une distance assez grande entre les personnes et la caméra afin de limiter les effets néfastes que la projection de perspective a sur l'analyse et la modélisation de la démarche. Par ailleurs, ces méthodes ne gèrent pas les changements de direction et de vitesse dans les marches. Cela limite grandement les marches pouvant être analysées et modélisées dans les applications médicales et les applications de surveillance. L'approche proposée dans cette thèse permet d'effectuer l'analyse, la modélisation et la comparaison de la démarche de personnes à partir de marches et de points de vue non contraints. L'approche proposée est principalement constituée d'une méthode de rectification du point de vue qui permet de générer une vue fronto-parallèle (vue de profil) de la trajectoire imagée des membres d'une personne. Cette méthode de rectification de la vue est basée sur un modèle de marche novateur qui utilise la géométrie projective pour faire les liens spatio-temporels entre la position des membres dans la scène et leur contrepartie dans les images provenant d'une caméra. La tête et les pieds sont les seuls membres nécessaires à l'approche proposée dans cette thèse. La position et le suivi de ces membres sont automatiquement effectués par un algorithme de suivi des membres développé dans le cadre de cette thèse. L'analyse de la démarche est effectuée par une nouvelle méthode qui extrait des caractéristiques de la démarche à partir de la trajectoire rectifiée des membres. Un nouveau modèle de la démarche basé sur la trajectoire rectifiée des membres est proposé afin de permettre la modélisation et la comparaison de la démarche en utilisant les caractéristiques dynamiques de la démarche. L'approche proposée dans cette thèse est premièrement validée à l'aide de marches synthétiques comprenant plusieurs points de vue différents ainsi que des changements de direction. Les résultats de cette étape de validation montrent que la méthode de rectification de la vue fonctionne correctement, et qu'il est possible d'extraire des caractéristiques de la démarche valides à partir de la trajectoire rectifiée des membres. Par la suite, l'analyse, la modélisation et la comparaison de la démarche de personnes sont effectuées sur des marches réelles qui ont été acquises dans le cadre de cette thèse. Ces marches sont particulièrement difficiles à analyser et à modéliser puisqu'elles ont été effectuées près de la caméra et qu'elles comportent des changements de direction et de vitesse. Les résultats d'analyse de la démarche confirment que les caractéristiques de la démarche obtenues à l'aide de la méthode proposée sont réalistes et sont en accord avec les résultats présentés dans les études cliniques de la démarche. Les résultats de modélisation et de comparaison de la démarche démontrent qu'il est possible d'utiliser la méthode proposée pour reconnaître des personnes par leur démarche dans le contexte des applications de surveillance. Les taux de reconnaissance obtenus sont bons considérant la complexité des marches utilisées dans cette thèse.Gait analysis, modelling and comparison using computer vision algorithms has recently attracted much attention for medical and surveillance applications. Analyzing and modelling a person's gait with computer vision algorithms has indeed some interesting advantages over more traditional biometrics. For instance, gait can be analyzed and modelled at a distance by observing the person with a camera, which means that no markers or sensors have to be worn by the person. Moreover, gait analysis and modelling using computer vision algorithms does not require the cooperation of the observed people, which thus allows for using gait as a biometric in surveillance applications. Current gait analysis and modelling approaches have however severe limitations. For instance, several approaches require a side view of the walks since this viewpoint is optimal for gait analysis and modelling. Most approaches also require the walks to be observed far enough from the camera in order to avoid perspective distortion effects that would badly affect the resulting gait analyses and models. Moreover, current approaches do not allow for changes in walk direction and in walking speed, which greatly constraints the walks that can be analyzed and modelled in medical and surveillance applications. The approach proposed in this thesis aims at performing gait analysis, modelling and comparison from unconstrained walks and viewpoints in medical and surveillance applications. The proposed approach mainly consists in a novel view-rectification method that generates a fronto-parallel viewpoint (side view) of the imaged trajectories of body parts. The view-rectification method is based on a novel walk model that uses projective geometry to provide the spatio-temporal links between the body-part positions in the scene and their corresponding positions in the images. The head and the feet are the only body parts that are relevant for the proposed approach. They are automatically localized and tracked in monocular video sequences using a novel body parts tracking algorithm. Gait analysis is performed by a novel method that extracts standard gait measurements from the view-rectified body-part trajectories. A novel gait model based on body-part trajectories is also proposed in order to perform gait modelling and comparison using the dynamics of the gait. The proposed approach is first validated using synthetic walks comprising different viewpoints and changes in the walk direction. The validation results shows that the proposed view-rectification method works well, that is, valid gait measurements can be extracted from the view-rectified body-part trajectories. Next, gait analysis, modelling, and comparison is performed on real walks acquired as part of this thesis. These walks are challenging since they were performed close to the camera and contain changes in walk direction and in walking speed. The results first show that the obtained gait measurements are realistic and correspond to the gait measurements found in references on clinical gait analysis. The gait comparison results then show that the proposed approach can be used to perform gait modelling and comparison in the context of surveillance applications by recognizing people by their gait. The computed recognition rates are quite good considering the challenging walks used in this thesis
Fotofacesua: sistema de gestão fotográfica da Universidade de Aveiro
Nowadays, automation is present in basically every computational system. With
the raise of Machine Learning algorithms through the years, the necessity of a human
being to intervene in a system has dropped a lot. Although, in Universities,
Companies and even governmental Institutions there are some systems that are
have not been automatized. One of these cases, is the profile photo management,
that stills requires human intervention to check if the image follows the Institution
set of criteria that are obligatory to submit a new photo.
FotoFaces is a system for updating the profile photos of collaborators at the University
of Aveiro that allows the collaborator to submit a new photo and, automatically,
through a set of image processing algorithms, decide if the photo meets a set of
predifined criteria. One of the main advantages of this system is that it can be
used in any institution and can be adapted to different needs by just changing the
algorithms or criteria considered. This Dissertation describes some improvements
implemented in the existing system, as well as some new features in terms of the
available algorithms.
The main contributions to the system are the following: sunglasses detection, hat
detection and background analysis. For the first two, it was necessary to create
a new database and label it to train, validate and test a deep transfer learning
network, used to detect sunglasses and hats. In addition, several tests were performed
varying the parameters of the network and using some machine learning and
pre-processing techniques on the input images. Finally, the background analysis
consists of the implementation and testing of 2 existing algorithms in the literature,
one low level and the other deep learning.
Overall, the results obtained in the improvement of the existing algorithms, as well
as the performance of the new image processing modules, allowed the creation of
a more robust (improved production version algorithms) and versatile (addition of
new algorithms to the system) profile photo update system.Atualmente, a automação está presente em basicamente todos os sistemas computacionais.
Com o aumento dos algoritmos de Aprendizagem Máquina ao longo
dos anos, a necessidade de um ser humano intervir num sistema caiu bastante.
Embora, em Universidades, Empresas e até Instituições governamentais, existam
alguns sistemas que não foram automatizados. Um desses casos, é a gestão de
fotos de perfil, que requer intervenção humana para verificar se a imagem segue o
conjunto de critérios da Instituição que são obrigatórios para a submissão de uma
nova foto.
O FotoFaces é um sistema de atualização de fotos do perfil dos colaboradores
na Universidade de Aveiro que permite ao colaborador submeter uma nova foto
e, automaticamente, através de um conjunto de algoritmos de processamnto de
imagem, decidir se a foto cumpre um conjunto de critérios predefinidos. Uma das
principais vantagens deste sistema é que pode ser utilizado em qualquer Instituição
e pode ser adaptado às diferentes necessidades alterando apenas os algoritmos ou
os critérios considerados. Esta Dissertação descreve algumas melhorias implementadas
no sistema existente, bem como algumas funcionalidades novas ao nível dos
algoritmos disponíveis.
As principais contribuições para o sistema são as seguintes: detecção de óculos de
sol, detecção de chapéus e análise de background. Para as duas primeiras, foi necessário
criar uma nova base de dados e rotulá-la para treinar, validar e testar uma
rede de aprendizagem profunda por transferência, utilizada para detectar os óculos
de sol e chapéus. Além disso, foram feitos vários testes variando os parâmetros
da rede e usando algumas técnicas de aprendizagem máquina e pré-processamento
sobre as imagens de entrada. Por fim, a análise do fundo consiste na implementação
e teste de 2 algoritmos existentes na literatura, um de baixo nível e outro de
aprendizagem profunda.
Globalmente, os resultados obtidos na melhoria dos algoritmos existentes, bem
como o desempenho dos novos módulos de processamneto de imagem, permitiram
criar um sistema de atualização de fotos do perfil mais robusto (melhoria
dos algoritmos da versão de produção) e versátil (adição de novos algoritmos ao
sistema).Mestrado em Engenharia Eletrónica e Telecomunicaçõe
A Web-Based Index of Historical Valuation Maps for the Erie Railroad
The purpose of this project was to develop a web based index of historical railroad valuation maps for the Erie Lackawanna Historical Society (ELHS). The ELHS was in possession of a complete collection of over 3600 scanned 1918 railroad valuation map for the Erie Railroad, however these scanned maps lacked spatial reference. The first step in making these maps usable in modern GIS was to georeference them using Esri’s ArcGIS. Once the maps were georeferenced, they were organized into a geodatabase, along with additional supporting layers, as well as geotagged historical photos relating to the railroads. In order to make these maps and data available to a wider audience, an interactive web application was developed using HTML, CSS, and Esri’s ArcGIS API for JavaScript, which allows users to view the georeferenced maps as a fully mosaicked map layer, or access the original maps and photographs individually or in bulk
DocMIR: An automatic document-based indexing system for meeting retrieval
This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter's computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio-visual data and metadata created during post-production. The identification is based on documents' signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality system
A Comparison of Change Detection Methods in an Urban Environment Using LANDSAT TM and ETM+ Satellite Imagery: A Multi-Temporal, Multi-Spectral Analysis of Gwinnett County, GA 1991-2000
Land cover change detection in urban areas provides valuable data on loss of forest and agricultural land to residential and commercial development. Using Landsat 5 Thematic Mapper (1991) and Landsat 7 ETM+ (2000) imagery of Gwinnett County, GA, change images were obtained using image differencing of Normalized Difference Vegetation Index (NDVI), principal components analysis (PCA), and Tasseled Cap-transformed images. Ground truthing and accuracy assessment determined that land cover change detection using the NDVI and Tasseled Cap image transformation methods performed best in the study area, while PCA performed the worst of the three methods assessed. Analyses on vegetative and vegetation changes from 1991- 2000 revealed that these methods perform well for detecting changes in vegetation and/or vegetative characteristics but do not always correspond with changes in land use. Gwinnett County lost an estimated 13,500 hectares of vegetation cover during the study period to urban sprawl, with the majority of the loss coming from forested areas
- …