3 research outputs found
Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach
We study the task of cleaning scanned text documents that are strongly
corrupted by dirt such as manual line strokes, spilled ink etc. We aim at
autonomously removing dirt from a single letter-size page based only on the
information the page contains. Our approach, therefore, has to learn character
representations without supervision and requires a mechanism to distinguish
learned representations from irregular patterns. To learn character
representations, we use a probabilistic generative model parameterizing pattern
features, feature variances, the features' planar arrangements, and pattern
frequencies. The latent variables of the model describe pattern class, pattern
position, and the presence or absence of individual pattern features. The model
parameters are optimized using a novel variational EM approximation. After
learning, the parameters represent, independently of their absolute position,
planar feature arrangements and their variances. A quality measure defined
based on the learned representation then allows for an autonomous
discrimination between regular character patterns and the irregular patterns
making up the dirt. The irregular patterns can thus be removed to clean the
document. For a full Latin alphabet we found that a single page does not
contain sufficiently many character examples. However, even if heavily
corrupted by dirt, we show that a page containing a lower number of character
types can efficiently and autonomously be cleaned solely based on the
structural regularity of the characters it contains. In different examples
using characters from different alphabets, we demonstrate generality of the
approach and discuss its implications for future developments.Comment: oral presentation and Google Student Travel Award; IEEE conference on
Computer Vision and Pattern Recognition 201
HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments
Many activities of daily living such as getting dressed, preparing food, wayfinding, or shopping rely heavily on visual information, and the inability to access that information can negatively impact the quality of life for people with vision impairments. While numerous researchers have explored solutions for assisting with visual tasks that can be performed at a distance, such as identifying landmarks for navigation or recognizing people and objects, few have attempted to provide access to nearby visual information through touch. Touch is a highly attuned means of acquiring tactile and spatial information, especially for people with vision impairments. By supporting touch-based access to information, we may help users to better understand how a surface appears (e.g., document layout, clothing patterns), thereby improving the quality of life.
To address this gap in research, this dissertation explores methods to augment a visually impaired user’s sense of touch with interactive, real-time computer vision to access information about the physical world. These explorations span three application areas: reading and exploring printed documents, controlling mobile devices, and identifying colors and visual textures. At the core of each application is a system called HandSight that uses wearable cameras and other sensors to detect touch events and identify surface content beneath the user’s finger. To create HandSight, we designed and implemented the physical hardware, developed signal processing and computer vision algorithms, and designed real-time feedback that enables users to interpret visual or digital content. We involve visually impaired users throughout the design and development process, conducting several user studies to assess usability and robustness and to improve our prototype designs.
The contributions of this dissertation include: (i) developing and iteratively refining HandSight, a novel wearable system to assist visually impaired users in their daily lives; (ii) evaluating HandSight across a diverse set of tasks, and identifying tradeoffs of a finger-worn approach in terms of physical design, algorithmic complexity and robustness, and usability; and (iii) identifying broader design implications for future wearable systems and for the fields of accessibility, computer vision, augmented and virtual reality, and human-computer interaction