207 research outputs found
Crowdsourcing design guidance for contextual adaptation of text content in augmented reality
Funding Information: This work was supported by EPSRC (grants EP/R004471/1 and EP/S027432/1). Supporting data for this publication is available at https://doi.org/10.17863/CAM.62931.Augmented Reality (AR) can deliver engaging user experiences that seamlessly meld virtual content with the physical environment. However, building such experiences is challenging due to the developer's inability to assess how uncontrolled deployment contexts may infuence the user experience. To address this issue, we demonstrate a method for rapidly conducting AR experiments and real-world data collection in the user's own physical environment using a privacy-conscious mobile web application. The approach leverages the large number of distinct user contexts accessible through crowdsourcing to efciently source diverse context and perceptual preference data. The insights gathered through this method complement emerging design guidance and sample-limited lab-based studies. The utility of the method is illustrated by reexamining the design challenge of adapting AR text content to the user's environment. Finally, we demonstrate how gathered design insight can be operationalized to provide adaptive text content functionality in an AR headset.Publisher PD
An Examination of Presentation Strategies for Textual Data in Augmented Reality
Videos with embedded text have been widely used in the past and the text in the videos usually contained valuable information. However, it was difficult for people to fully understand the text in videos displayed on smartphones due to obstructions such as color conflicts between letters and the moving background. Adjustments to texts that would support the human visual system, such as changes to brightness and color contrast, increased legibility of text, and taking into account the phantom illumination (PI) illusion (the optical illusion that increases the perception of brightness in a certain area), should be able to improve peoples’ ability to read text in augmented reality (AR) applications on smartphones. The researcher created a text presentation style implementing the PI illusion, using solid white text on a 50% transparent black billboard with a black-white shading PI illusion at the internal edge. An experiment was conducted to verify whether the text presentation style could improve reading performance. The experiment showed that the PI illusion was unable to improve legibility of text in AR applications on smartphones. However, the data suggested that, in some cases, certain participants, especially from some specific major groups, have difficulties text reading when the text is presented using the standard text presentation style without the enhancement of the PI illusion
Toward Robust Video Event Detection and Retrieval Under Adversarial Constraints
The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the state of the art by an order of magnitude.Doctor of Philosoph
Recommended from our members
Probabilistic User Interface Design for Virtual and Augmented Reality Applications
The central hypothesis of this thesis is that probabilistic user interface design provides an effective methodology for delivering productive and enjoyable applications in virtual reality (VR) and augmented reality (AR). This investigation is timely given the recent emergence of mass-market virtual and augmented reality head-mounted displays and growing demand for tailored applications and content. The design guidance for building compelling and productive applications for these environments is, however, currently lagging the pace at which the underlying technology is maturing. This is problematic given important differences between designing conventional 2D interfaces and interactions and their embodied 3D counterparts. This dissertation investigates probabilistic user interface design as a method for solving many of the novel challenges encountered when developing applications for VR and AR.
Probabilistic user interface design seeks to model the uncertain events in a system and identify, implement and validate strategies that drive improved system performance. This thesis addresses four research questions by applying a probabilistic treatment in four distinct but closely related case studies. These four case studies are selected to illustrate the flexibility and unique benefits offered by this method.
Research Question 1 asks how the probabilistic qualities of an interface can be determined and how this can inform design. This question is investigated in the context of text entry in VR with a probabilistic characterisation performed on two fundamental design choices. Research Question 2 relates to the challenge of adapting AR applications to deployment contexts not knowable at design time. A study in which crowdworkers are employed to build a probabilistic understanding of the requirements for contextually adaptive AR answers this question. The text entry theme is revisited in answering Research Questions 3 which asks how high levels of input noise can be mitigated through inference. A probabilistic text entry method specifically tailored for use in AR is implemented and evaluated. Finally, Research Question 4 asks how the high dimensional design space in AR and VR applications can be efficiently explored to support ideal design choices. Interface refinement through probabilistic optimisation and crowdsourcing is shown to be highly efficient and effective for this purpose.
A probabilistic treatment in the design process has many potential benefits, principle among which is increased robustness to circumstances unanticipated at design time. This thesis contributes to the toolset and guidance available to designers and supports the development of next generation user interfaces specifically tailored to virtual and augmented reality
Recommended from our members
Technological framework for ubiquitous interactions using context–aware mobile devices
This report presents research and development of dedicated system architecture, designed to enable its users to interact with each other as well as to access information on Points of Interest that exist in their immediate environment. This is accomplished through managing personal preferences and contextual information in a distributed manner and in real-time. The advantage of this system architecture is that it uses mobile devices, heterogeneous sensors and a selection of user interface paradigms to produce a sociotechnical framework to enhance the perception of the environment and promote intuitive interactions. The thrust of the work has been on software development and component integration. Iterative prototyping was adopted as a development method in order to effectively implement the users’ feedback and establish a platform for collaboration that closely meets the requirements and aids their decision-making process. The requirement acquisition was followed by the system-modelling phase in order to produce a robust software prototype. The implementation includes component-based development and extensive use of design patterns over native programming. Conclusively, the software product has become the means to evaluate differences in the use of mixed reality technologies in a ubiquitous scenario.
The prototype can query a number of context sources such as sensors, or details of the personal profile, to acquire relevant data. The data (and metadata) is stored in opensource structures, so that they are accessible at every layer of the system architecture and at any time. By proactively processing the acquired context, the system can assist the users in their tasks (e.g. navigation) without explicit input – e.g. by simply creating a gesture with the device. However, advanced interaction with the application via the user interface is available for requests that are more complex.
Representations of the real world objects, their spatial relations and other captured features of interest are visualised on scalable interfaces, ranging from 2D to 3D models and from photorealism to stylised clues and symbols. Two principal modes of operation have been implemented; one, using geo-referenced virtual reality models of the environment, updated in real time, and second, using the overlay of descriptive annotations and graphics on the video images of the surroundings, captured by a video camera. The latter is referred to as augmented reality.
The continuous feed of the device position and orientation data, from the GPS receiver and the digital compass, into the application, makes the framework fit for use in unknown environments and therefore suitable for ubiquitous operation. This is one of the novelties of the proposed framework, because it enables a whole range of social, peer-to-peer interactions to take place. The scenarios of how the system could be employed to pursue these remote interactions and collaborative efforts on mobile devices are addressed in the context of urban navigation. The conceptual design and implementation of the novel location and orientation based algorithm for mobile AR are presented in detail. The system is, however, multifaceted and capable of supporting peer-to-peer exchange of information in a pervasive fashion, usable in various contexts. The modalities of these interactions are explored and laid out in several scenarios, but particularly in the context of user adoption. Two evaluation tasks took place. The preliminary evaluation examined certain aspects that influence user interaction while being immersed in a virtual environment, whereas the second summative evaluation compared the utility and certain usability aspects of the AR and VR interfaces
Understanding, Modeling, and Simulating the Discrepancy Between Intended and Perceived Image Appearance on Optical See-Through Augmented Reality Displays
Augmented reality (AR) displays are transitioning from being primarily used in research and development settings, to being used by the general public. With this transition, these displays will be used by more people, in many different environments, and in many different contexts. Like other displays, the user\u27s perception of virtual imagery is influenced by the characteristics of the user\u27s environment, creating a discrepancy between the intended appearance and the perceived appearance of virtual imagery shown on the display. However, this problem is much more apparent for optical see-through AR displays, such as the HoloLens. For these displays, imagery is superimposed onto the user\u27s view of their environment, which can cause the imagery to become transparent and washed out in appearance from the user\u27s perspective. Any change in the user\u27s environment conditions or in the user\u27s position introduces changes to the perceived appearance of the AR imagery, and current AR displays do not adapt to maintain a consistent perceived appearance of the imagery being displayed. Because of this, in many environments the user may misinterpret or fail to notice information shown on the display. In this dissertation, I investigate the factors that influence user perception of AR imagery and demonstrate examples of how the user\u27s perception is affected for applications involving user interfaces, attention cues, and virtual humans. I establish a mathematical model that relates the user, their environment, their AR display, and AR imagery in terms of luminance or illuminance contrast. I demonstrate how this model can be used to classify the user\u27s viewing conditions and identify problems the user is prone to experience when in these conditions. I demonstrate how the model can be used to simulate changes in the user\u27s viewing conditions and to identify methods to maintain the perceived appearance of the AR imagery in changing conditions
- …