82 research outputs found

    Practicable Soundmapping: JavaScript enabled Edge Compute

    Get PDF
    In this paper, we report on developments of the Citygram, a comprehensive sensor network platform for capturing, streaming, analyzing, mapping, visualizing, and providing easy access to spatiotemporal soundscape data. Launched in 2011, Citygram’s recent strategic decision has resulted in system redesign to enable migration from a codebase built on multiple computer languages to a cross-platform single JavaScript codebase. Citygram now runs on V8 engines including standard web-browsers and in the node.js environment. The Citygram sensor network significantly alleviates problems concerning soundmapping complexities including operating system limitations, hardware dependency, software update and dissemination issues, data access mechanism, data visualization, and cost. This strategy has made practicable a key design philosophy for soundmapping: rapid sensor network growth for capturing spatiotemporally granular soundscapes. We summarize research and development for the following Citygram modules (1) cost-effective sensor module allowing high-value data transmission through edge compute paradigms, (2) machine learning module focusing on environmental sound classification, and (3) visualization and data access prototypes

    Toward Robust Video Event Detection and Retrieval Under Adversarial Constraints

    Get PDF
    The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the state of the art by an order of magnitude.Doctor of Philosoph

    A Novel Design of Audio CAPTCHA for Visually Impaired Users

    Get PDF
    CAPTCHAs are widely used by web applications for the purpose of security and privacy. However, traditional text-based CAPTCHAs are not suitable for sighted users much less users with visual impairments. To address the issue, this paper proposes a new mechanism for CAPTCHA called HearAct, which is a real-time audio-based CAPTCHA that enables easy access for users with visual impairments. The user listens to the sound of something (the “sound-maker”), and he/she must identify what the sound-maker is. After that, HearAct identifies a word and requires the user to analyze a word and determine whether it has the stated letter or not. If the word has the letter, the user must tap and if not, they swipe. This paper presents our HearAct pilot study conducted with thirteen blind users. The preliminary user study results suggest the new form of CAPTCHA has a lot of potential for both blind and visual users. The results also show that the HearAct CAPTCHA can be solved in a shorter time than the text-based CAPTCHAs because HearAct allows users to solve the CAPTCHA using gestures instead of typing. Thus, participants preferred HearAct over audio-based CAPTCHAs. The results of the study also show that the success rate of solving the HearAct CAPTCHA is 82.05% and 43.58% for audio CAPTCHA. A significant usability differences between the System Usability score for HearAct CAPTCHA method was 88.07 compared to audio CAPTCHA was 52.11%. Using gestures to solve the CAPTCHA challenge is the most preferable feature in the HearAct solution. To increase the security of HearAct, it is necessary to increase the number of sounds in the CAPTCHA. There is also a need to improve the CAPTCHA solution to cover wide range of users by adding corresponding image with each sound to meet deaf users’ needs; they then need to identify the spelling of the sound maker’s word

    From Understanding Telephone Scams to Implementing Authenticated Caller ID Transmission

    Get PDF
    abstract: The telephone network is used by almost every person in the modern world. With the rise of Internet access to the PSTN, the telephone network today is rife with telephone spam and scams. Spam calls are significant annoyances for telephone users, unlike email spam, spam calls demand immediate attention. They are not only significant annoyances but also result in significant financial losses in the economy. According to complaint data from the FTC, complaints on illegal calls have made record numbers in recent years. Americans lose billions to fraud due to malicious telephone communication, despite various efforts to subdue telephone spam, scam, and robocalls. In this dissertation, a study of what causes the users to fall victim to telephone scams is presented, and it demonstrates that impersonation is at the heart of the problem. Most solutions today primarily rely on gathering offending caller IDs, however, they do not work effectively when the caller ID has been spoofed. Due to a lack of authentication in the PSTN caller ID transmission scheme, fraudsters can manipulate the caller ID to impersonate a trusted entity and further a variety of scams. To provide a solution to this fundamental problem, a novel architecture and method to authenticate the transmission of the caller ID is proposed. The solution enables the possibility of a security indicator which can provide an early warning to help users stay vigilant against telephone impersonation scams, as well as provide a foundation for existing and future defenses to stop unwanted telephone communication based on the caller ID information.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    When keystroke meets password: Attacks and defenses

    Get PDF

    Acoustic-channel attack and defence methods for personal voice assistants

    Get PDF
    Personal Voice Assistants (PVAs) are increasingly used as interface to digital environments. Voice commands are used to interact with phones, smart homes or cars. In the US alone the number of smart speakers such as Amazon’s Echo and Google Home has grown by 78% to 118.5 million and 21% of the US population own at least one device. Given the increasing dependency of society on PVAs, security and privacy of these has become a major concern of users, manufacturers and policy makers. Consequently, a steep increase in research efforts addressing security and privacy of PVAs can be observed in recent years. While some security and privacy research applicable to the PVA domain predates their recent increase in popularity and many new research strands have emerged, there lacks research dedicated to PVA security and privacy. The most important interaction interface between users and a PVA is the acoustic channel and acoustic channel related security and privacy studies are desirable and required. The aim of the work presented in this thesis is to enhance the cognition of security and privacy issues of PVA usage related to the acoustic channel, to propose principles and solutions to key usage scenarios to mitigate potential security threats, and to present a novel type of dangerous attack which can be launched only by using a PVA alone. The five core contributions of this thesis are: (i) a taxonomy is built for the research domain of PVA security and privacy issues related to acoustic channel. An extensive research overview on the state of the art is provided, describing a comprehensive research map for PVA security and privacy. It is also shown in this taxonomy where the contributions of this thesis lie; (ii) Work has emerged aiming to generate adversarial audio inputs which sound harmless to humans but can trick a PVA to recognise harmful commands. The majority of work has been focused on the attack side, but there rarely exists work on how to defend against this type of attack. A defence method against white-box adversarial commands is proposed and implemented as a prototype. It is shown that a defence Automatic Speech Recognition (ASR) can work in parallel with the PVA’s main one, and adversarial audio input is detected if the difference in the speech decoding results between both ASR surpasses a threshold. It is demonstrated that an ASR that differs in architecture and/or training data from the the PVA’s main ASR is usable as protection ASR; (iii) PVAs continuously monitor conversations which may be transported to a cloud back end where they are stored, processed and maybe even passed on to other service providers. A user has limited control over this process when a PVA is triggered without user’s intent or a PVA belongs to others. A user is unable to control the recording behaviour of surrounding PVAs, unable to signal privacy requirements and unable to track conversation recordings. An acoustic tagging solution is proposed aiming to embed additional information into acoustic signals processed by PVAs. A user employs a tagging device which emits an acoustic signal when PVA activity is assumed. Any active PVA will embed this tag into their recorded audio stream. The tag may signal a cooperating PVA or back-end system that a user has not given a recording consent. The tag may also be used to trace when and where a recording was taken if necessary. A prototype tagging device based on PocketSphinx is implemented. Using Google Home Mini as the PVA, it is demonstrated that the device can tag conversations and the tagging signal can be retrieved from conversations stored in the Google back-end system; (iv) Acoustic tagging provides users the capability to signal their permission to the back-end PVA service, and another solution inspired by Denial of Service (DoS) is proposed as well for protecting user privacy. Although PVAs are very helpful, they are also continuously monitoring conversations. When a PVA detects a wake word, the immediately following conversation is recorded and transported to a cloud system for further analysis. An active protection mechanism is proposed: reactive jamming. A Protection Jamming Device (PJD) is employed to observe conversations. Upon detection of a PVA wake word the PJD emits an acoustic jamming signal. The PJD must detect the wake word faster than the PVA such that the jamming signal still prevents wake word detection by the PVA. An evaluation of the effectiveness of different jamming signals and overlap between wake words and the jamming signals is carried out. 100% jamming success can be achieved with an overlap of at least 60% with a negligible false positive rate; (v) Acoustic components (speakers and microphones) on a PVA can potentially be re-purposed to achieve acoustic sensing. This has great security and privacy implication due to the key role of PVAs in digital environments. The first active acoustic side-channel attack is proposed. Speakers are used to emit human inaudible acoustic signals and the echo is recorded via microphones, turning the acoustic system of a smartphone into a sonar system. The echo signal can be used to profile user interaction with the device. For example, a victim’s finger movement can be monitored to steal Android unlock patterns. The number of candidate unlock patterns that an attacker must try to authenticate herself to a Samsung S4 phone can be reduced by up to 70% using this novel unnoticeable acoustic side-channel
    • …
    corecore