16,228 research outputs found

    Security and Privacy Problems in Voice Assistant Applications: A Survey

    Full text link
    Voice assistant applications have become omniscient nowadays. Two models that provide the two most important functions for real-life applications (i.e., Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR) models and Speaker Identification (SI) models. According to recent studies, security and privacy threats have also emerged with the rapid development of the Internet of Things (IoT). The security issues researched include attack techniques toward machine learning models and other hardware components widely used in voice assistant applications. The privacy issues include technical-wise information stealing and policy-wise privacy breaches. The voice assistant application takes a steadily growing market share every year, but their privacy and security issues never stopped causing huge economic losses and endangering users' personal sensitive information. Thus, it is important to have a comprehensive survey to outline the categorization of the current research regarding the security and privacy problems of voice assistant applications. This paper concludes and assesses five kinds of security attacks and three types of privacy threats in the papers published in the top-tier conferences of cyber security and voice domain.Comment: 5 figure

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Sign Language Translation from Instructional Videos

    Full text link
    The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2

    The place where curses are manufactured : four poets of the Vietnam War

    Get PDF
    The Vietnam War was unique among American wars. To pinpoint its uniqueness, it was necessary to look for a non-American voice that would enable me to articulate its distinctiveness and explore the American character as observed by an Asian. Takeshi Kaiko proved to be most helpful. From his novel, Into a Black Sun, I was able to establish a working pair of 'bookends' from which to approach the poetry of Walter McDonald, Bruce Weigl, Basil T. Paquet and Steve Mason. Chapter One is devoted to those seemingly mismatched 'bookends,' Walt Whitman and General William C. Westmoreland, and their respective anthropocentric and technocentric visions of progress and the peculiarly American concept of the "open road" as they manifest themselves in Vietnam. In Chapter, Two, I analyze the war poems of Walter McDonald. As a pilot, writing primarily about flying, his poetry manifests General Westmoreland's technocentric vision of the 'road' as determined by and manifest through technology. Chapter Three focuses on the poems of Bruce Weigl. The poems analyzed portray the literal and metaphorical descent from the technocentric, 'numbed' distance of aerial warfare to the world of ground warfare, and the initiation of a 'fucking new guy,' who discovers the contours of the self's interior through a set of experiences that lead from from aerial insertion into the jungle to the degradation of burning human feces. Chapter Four, devoted to the thirteen poems of Basil T. Paquet, focuses on the continuation of the descent begun in Chapter Two. In his capacity as a medic, Paquet's entire body of poems details his quotidian tasks which entail tending the maimed, the mortally wounded and the dead. The final chapter deals with Steve Mason's JohnnY's Song, and his depiction of the plight of Vietnam veterans back in "The World" who are still trapped inside the interior landscape of their individual "ghettoes" of the soul created by their war-time experiences

    Augmented classification for electrical coil winding defects

    Get PDF
    A green revolution has accelerated over the recent decades with a look to replace existing transportation power solutions through the adoption of greener electrical alternatives. In parallel the digitisation of manufacturing has enabled progress in the tracking and traceability of processes and improvements in fault detection and classification. This paper explores electrical machine manufacture and the challenges faced in identifying failures modes during this life cycle through the demonstration of state-of-the-art machine vision methods for the classification of electrical coil winding defects. We demonstrate how recent generative adversarial networks can be used to augment training of these models to further improve their accuracy for this challenging task. Our approach utilises pre-processing and dimensionality reduction to boost performance of the model from a standard convolutional neural network (CNN) leading to a significant increase in accuracy

    Strategies for Early Learners

    Get PDF
    Welcome to learning about how to effectively plan curriculum for young children. This textbook will address: • Developing curriculum through the planning cycle • Theories that inform what we know about how children learn and the best ways for teachers to support learning • The three components of developmentally appropriate practice • Importance and value of play and intentional teaching • Different models of curriculum • Process of lesson planning (documenting planned experiences for children) • Physical, temporal, and social environments that set the stage for children’s learning • Appropriate guidance techniques to support children’s behaviors as the self-regulation abilities mature. • Planning for preschool-aged children in specific domains including o Physical development o Language and literacy o Math o Science o Creative (the visual and performing arts) o Diversity (social science and history) o Health and safety • Making children’s learning visible through documentation and assessmenthttps://scholar.utc.edu/open-textbooks/1001/thumbnail.jp

    Learning disentangled speech representations

    Get PDF
    A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody. The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions. In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks. This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically

    Building body identities - exploring the world of female bodybuilders

    Get PDF
    This thesis explores how female bodybuilders seek to develop and maintain a viable sense of self despite being stigmatized by the gendered foundations of what Erving Goffman (1983) refers to as the 'interaction order'; the unavoidable presentational context in which identities are forged during the course of social life. Placed in the context of an overview of the historical treatment of women's bodies, and a concern with the development of bodybuilding as a specific form of body modification, the research draws upon a unique two year ethnographic study based in the South of England, complemented by interviews with twenty-six female bodybuilders, all of whom live in the U.K. By mapping these extraordinary women's lives, the research illuminates the pivotal spaces and essential lived experiences that make up the female bodybuilder. Whilst the women appear to be embarking on an 'empowering' radical body project for themselves, the consequences of their activity remains culturally ambivalent. This research exposes the 'Janus-faced' nature of female bodybuilding, exploring the ways in which the women negotiate, accommodate and resist pressures to engage in more orthodox and feminine activities and appearances

    Data-to-text generation with neural planning

    Get PDF
    In this thesis, we consider the task of data-to-text generation, which takes non-linguistic structures as input and produces textual output. The inputs can take the form of database tables, spreadsheets, charts, and so on. The main application of data-to-text generation is to present information in a textual format which makes it accessible to a layperson who may otherwise find it problematic to understand numerical figures. The task can also automate routine document generation jobs, thus improving human efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or its variants. These models generate fluent (but often imprecise) text and perform quite poorly at selecting appropriate content and ordering it coherently. This thesis focuses on overcoming these issues by integrating content planning with neural models. We hypothesize data-to-text generation will benefit from explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our generator are tables (with records) in the sports domain. And the output are summaries describing what happened in the game (e.g., who won/lost, ..., scored, etc.). We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records should be mentioned and in which order, and then generate the document while taking the micro plan into account. We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the records corresponding to the entities by using hierarchical attention at each time step. We then combine planning with the high level organization of entities, events, and their interactions. Such coarse-grained macro plans are learnt from data and given as input to the generator. Finally, we present work on making macro plans latent while incrementally generating a document paragraph by paragraph. We infer latent plans sequentially with a structured variational model while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document
    • …
    corecore