78 research outputs found
Real-time content-aware video retargeting on the Android platform for tunnel vision assistance
As mobile devices continue to rise in popularity, advances in overall mobile device processing power lead to further expansion of their capabilities. This, coupled with the fact that many people suffer from low vision, leaves substantial room for advancing mobile development for low vision assistance. Computer vision is capable of assisting and accommodating individuals with blind spots or tunnel vision by extracting the necessary information and presenting it to the user in a manner they are able to visualize. Such a system would enable individuals with low vision to function with greater ease. Additionally, offering assistance on a mobile platform allows greater access. The objective of this thesis is to develop a computer vision application for low vision assistance on the Android mobile device platform. Specifically, the goal of the application is to reduce the effects tunnel vision inflicts on individuals. This is accomplished by providing an in-depth real-time video retargeting model that builds upon previous works and applications. Seam carving is a content-aware retargeting operator which defines 8-connected paths, or seams, of pixels. The optimality of these seams is based on a specific energy function. Discrete removal of these seams permits changes in the aspect ratio while simultaneously preserving important regions. The video retargeting model incorporates spatial and temporal considerations to provide effective image and video retargeting. Data reduction techniques are utilized in order to generate an efficient model. Additionally, a minimalistic multi-operator approach is constructed to diminish the disadvantages experienced by individual operators. In the event automated techniques fail, interactive options are provided that allow for user intervention. Evaluation of the application and its video retargeting model is based on its comparison to existing standard algorithms and its ability to extend itself to real-time. Performance metrics are obtained for both PC environments and mobile device platforms for comparison
Towards Data-Driven Large Scale Scientific Visualization and Exploration
Technological advances have enabled us to acquire extremely large
datasets but it remains a challenge to store, process, and extract
information from them. This dissertation builds upon recent advances
in machine learning, visualization, and user interactions to
facilitate exploration of large-scale scientific datasets. First, we
use data-driven approaches to computationally identify regions of
interest in the datasets. Second, we use visual presentation for
effective user comprehension. Third, we provide interactions for
human users to integrate domain knowledge and semantic information
into this exploration process.
Our research shows how to extract, visualize, and explore informative
regions on very large 2D landscape images, 3D volumetric datasets,
high-dimensional volumetric mouse brain datasets with thousands of
spatially-mapped gene expression profiles, and geospatial trajectories
that evolve over time. The contribution of this dissertation include:
(1) We introduce a sliding-window saliency model that discovers
regions of user interest in very large images; (2) We develop visual
segmentation of intensity-gradient histograms to identify meaningful
components from volumetric datasets; (3) We extract boundary surfaces
from a wealth of volumetric gene expression mouse brain profiles to
personalize the reference brain atlas; (4) We show how to efficiently
cluster geospatial trajectories by mapping each sequence of locations
to a high-dimensional point with the kernel distance framework.
We aim to discover patterns, relationships, and anomalies that would
lead to new scientific, engineering, and medical advances. This work
represents one of the first steps toward better visual understanding
of large-scale scientific data by combining machine learning and human
intelligence
Understanding Visual Feedback in Large-Display Touchless Interactions: An Exploratory Study
Touchless interactions synthesize input and output from physically disconnected motor and display spaces without any haptic feedback. In the absence of any haptic feedback, touchless interactions primarily rely on visual cues, but properties of visual feedback remain unexplored. This paper systematically investigates how large-display touchless interactions are affected by (1) types of visual feedback—discrete, partial, and continuous; (2) alternative forms of touchless cursors; (3) approaches to visualize target-selection; and (4) persistent visual cues to support out-of-range and drag-and-drop gestures. Results suggest that continuous was more effective than partial visual feedback; users disliked opaque cursors, and efficiency did not increase when cursors were larger than display artifacts’ size. Semantic visual feedback located at the display border improved users’ efficiency to return within the display range; however, the path of movement echoed in drag-and-drop operations decreased efficiency. Our findings contribute key ingredients to design suitable visual feedback for large-display touchless environments.This work was partially supported by an IUPUI Research Support Funds Grant (RSFG)
Physical Interaction Concepts for Knowledge Work Practices
The majority of workplaces in developed countries concern knowledge work. Accordingly, the IT industry and research made great efforts for many years to support knowledge workers -- and indeed, computer-based information workplaces have come of age. Nevertheless, knowledge work in the physical world has still quite a number of unique advantages, and the integration of physical and digital knowledge work leaves a lot to be desired. The present thesis aims at reducing these deficiencies; thereby, it leverages late technology trends, in particular interactive tabletops and resizable hand-held displays.
We start from the observation that knowledge workers develop highly efficient practices, skills, and dexterity of working with physical objects in the real world, whether content-unrelated (coffee mugs, stationery etc.) or content-related (books, notepads etc.). Among the latter, paper-based objects -- the notorious analog information bearers -- represent by far the most relevant (super-) category. We discern two kinds of practices: collective practices concern the arrangement of objects with respect to other objects and the desk, while specific practices operate on individual objects and usually alter them. The former are mainly employed for an effective management of the physical desktop workspace -- e.g., everyday objects are frequently moved on tables to optimize the desk as a workplace -- or an effective organization of paper-based documents on the desktop -- e.g., stacking, fanning out, sorting etc. The latter concern the specific manipulation of physical objects related to the task at hand, i.e. knowledge work. Widespread assimilated practices concern not only writing on, annotating, or spatially arranging paper documents but also sophisticated manipulations -- such as flipping, folding, bending, etc.
Compared to the wealth of such well-established practices in the real world, those for digital knowledge work are bound by the indirection imposed by mouse and keyboard input, where the mouse provided such a great advancement that researchers were seduced to calling its use "direct manipulation". In this light, the goal of this thesis can be rephrased as exploring novel interaction concepts for knowledge workers that i) exploit the flexible and direct manipulation potential of physical objects (as present in the real world) for more intuitive and expressive interaction with digital content, and ii) improve the integration of the physical and digital knowledge workplace. Thereby, two directions of research are pursued. Firstly, the thesis investigates the collective practices executed on the desks of knowledge workers, thereby discerning content-related (more precisely, paper-based documents) and content-unrelated object -- this part is coined as table-centric approaches and leverages the technology of interactive tabletops. Secondly, the thesis looks at specific practices executed on paper, obviously concentrating on knowledge related tasks due to the specific role of paper -- this part is coined as paper-centric approaches and leverages the affordances of paper-like displays, more precisely of resizable i.e. rollable and foldable displays.
The table-centric approach leads to the challenge of blending interactive tabletop technology with the established use of physical desktop workspaces. We first conduct an exploratory user study to investigate behavioral and usage patterns of interaction with both physical and digital documents on tabletop surfaces while performing tasks such as grouping and browsing. Based on results of the study, we contribute two sets of interaction and visualization concepts -- coined as PaperTop and ObjecTop -- that concern specific paper based practices and collective practices, respectively. Their efficiency and effectiveness are evaluated in a series of user studies.
As mentioned, the paper-centric perspective leverages late ultra-thin resizable display technology. We contribute two sets of novel interaction concepts again -- coined as FoldMe and Xpaaand -- that respond to the design space of dual-sided foldable and of rollout displays, respectively. In their design, we leverage the physical act of resizing not "just" for adjusting the screen real estate but also for interactively performing operations. Initial user studies show a great potential for interaction with digital contents, i.e. for knowledge work
Recommended from our members
Learning human activities and poses with interconnected data sources
Understanding human actions and poses in images or videos is a challenging problem in computer vision. There are different topics related to this problem such as action recognition, pose estimation, human-object interaction, and activity detection. Knowledge of actions and poses could benefit many applications, including video search, surveillance, auto-tagging, event detection, and human-computer interfaces. To understand humans' actions and poses, we need to address several challenges. First, humans are able to perform an enormous amount of poses. For example, simply to move forward, we can do crawling, walking, running, and sprinting. These poses all look different and require examples to cover these variations. Second, the appearance of a person's pose changes when looking from different viewing angles. The learned action model needs to cover the variations from different views. Third, many actions involve interactions between people and other objects, so we need to consider the appearance change corresponding to that object as well. Fourth, collecting such data for learning is difficult and expensive. Last, even if we can learn a good model for an action, to localize when and where the action happens in a long video remains a difficult problem due to the large search space. My key idea to alleviate these obstacles in learning humans' actions and poses is to discover the underlying patterns that connect the information from different data sources. Why will there be underlying patterns? The intuition is that all people share the same articulated physical structure. Though we can change our pose, there are common regulations that limit how our pose can be and how it can move over time. Therefore, all types of human data will follow these rules and they can serve as prior knowledge or regularization in our learning framework. If we can exploit these tendencies, we are able to extract additional information from data and use them to improve learning of humans' actions and poses. In particular, we are able to find patterns for how our pose could vary over time, how our appearance looks in a specific view, how our pose is when we are interacting with objects with certain properties, and how part of our body configuration is shared across different poses. If we could learn these patterns, they can be used to interconnect and extrapolate the knowledge between different data sources. To this end, I propose several new ways to connect human activity data. First, I show how to connect snapshot images and videos by exploring the patterns of how our pose could change over time. Building on this idea, I explore how to connect humans' poses across multiple views by discovering the correlations between different poses and the latent factors that affect the viewpoint variations. In addition, I consider if there are also patterns connecting our poses and nearby objects when we are interacting with them. Furthermore, I explore how we can utilize the predicted interaction as a cue to better address existing recognition problems including image re-targeting and image description generation. Finally, after learning models effectively incorporating these patterns, I propose a robust approach to efficiently localize when and where a complex action happens in a video sequence. The variants of my proposed approaches offer a good trade-off between computational cost and detection accuracy. My thesis exploits various types of underlying patterns in human data. The discovered structure is used to enhance the understanding of humans' actions and poses. By my proposed methods, we are able to 1) learn an action with very few snapshots by connecting them to a pool of label-free videos, 2) infer the pose for some views even without any examples by connecting the latent factors between different views, 3) predict the location of an object that a person is interacting with independent of the type and appearance of that object, then use the inferred interaction as a cue to improve recognition, and 4) localize an action in a complex long video. These approaches improve existing frameworks for understanding humans' actions and poses without extra data collection cost and broaden the problems that we can tackle.Computer Science
- …