2 research outputs found

    Pixinwav: Residual steganography for hiding pixels in audio

    Get PDF
    Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual steganography setup we propose allows an encoding of the hidden image that is independent from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offline—which can be later hidden, in a residual fashion, into any audio signal.Work partially supported by the European Union through the Erasmus+ student mobility program, Science Foundation Ireland (SFI) under grant numbers SFI/15/SIRG/3283 and SFI/12/RC/2289 P2, and the Spanish Research Agency (AEI) under project PID2020117142GB-I00 of the call MCIN/ AEI /10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    EgoSign: A First-Person View Dataset for Sign Language

    No full text
    Wearable headsets for interacting in virtual worlds are becoming more and more popular as new interaction platforms, such as Metaverse, are rising. An opportunity to overcome communication gaps between the Deaf and Hearing communities is presented. Sign language recognition, translation and production are challenging problems difficult to assess, among other reasons, due to the absence of good datasets. Towards this end, in 2021 How2Sign was published, a multimodal and multiview continuous American Sign Language dataset. In this project we introduce, EgoSign, an egocentric dataset built up as an extension of the existing How2Sign. This new dataset uses the hand tracking functionality of the new Oculus Quest 2 headset to obtain high quality hand pose estimation, a central feature for sign language understanding that cannot be obtained from the How2Sign videos with the publicly available human pose estimators
    corecore