596 research outputs found

    Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

    Full text link
    Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.Comment: 5 page

    Analysis and Mitigation of Remote Side-Channel and Fault Attacks on the Electrical Level

    Get PDF
    In der fortlaufenden Miniaturisierung von integrierten Schaltungen werden physikalische Grenzen erreicht, wobei beispielsweise Einzelatomtransistoren eine mögliche untere Grenze für Strukturgrößen darstellen. Zudem ist die Herstellung der neuesten Generationen von Mikrochips heutzutage finanziell nur noch von großen, multinationalen Unternehmen zu stemmen. Aufgrund dieser Entwicklung ist Miniaturisierung nicht länger die treibende Kraft um die Leistung von elektronischen Komponenten weiter zu erhöhen. Stattdessen werden klassische Computerarchitekturen mit generischen Prozessoren weiterentwickelt zu heterogenen Systemen mit hoher Parallelität und speziellen Beschleunigern. Allerdings wird in diesen heterogenen Systemen auch der Schutz von privaten Daten gegen Angreifer zunehmend schwieriger. Neue Arten von Hardware-Komponenten, neue Arten von Anwendungen und eine allgemein erhöhte Komplexität sind einige der Faktoren, die die Sicherheit in solchen Systemen zur Herausforderung machen. Kryptografische Algorithmen sind oftmals nur unter bestimmten Annahmen über den Angreifer wirklich sicher. Es wird zum Beispiel oft angenommen, dass der Angreifer nur auf Eingaben und Ausgaben eines Moduls zugreifen kann, während interne Signale und Zwischenwerte verborgen sind. In echten Implementierungen zeigen jedoch Angriffe über Seitenkanäle und Faults die Grenzen dieses sogenannten Black-Box-Modells auf. Während bei Seitenkanalangriffen der Angreifer datenabhängige Messgrößen wie Stromverbrauch oder elektromagnetische Strahlung ausnutzt, wird bei Fault Angriffen aktiv in die Berechnungen eingegriffen, und die falschen Ausgabewerte zum Finden der geheimen Daten verwendet. Diese Art von Angriffen auf Implementierungen wurde ursprünglich nur im Kontext eines lokalen Angreifers mit Zugriff auf das Zielgerät behandelt. Jedoch haben bereits Angriffe, die auf der Messung der Zeit für bestimmte Speicherzugriffe basieren, gezeigt, dass die Bedrohung auch durch Angreifer mit Fernzugriff besteht. In dieser Arbeit wird die Bedrohung durch Seitenkanal- und Fault-Angriffe über Fernzugriff behandelt, welche eng mit der Entwicklung zu mehr heterogenen Systemen verknüpft sind. Ein Beispiel für neuartige Hardware im heterogenen Rechnen sind Field-Programmable Gate Arrays (FPGAs), mit welchen sich fast beliebige Schaltungen in programmierbarer Logik realisieren lassen. Diese Logik-Chips werden bereits jetzt als Beschleuniger sowohl in der Cloud als auch in Endgeräten eingesetzt. Allerdings wurde gezeigt, wie die Flexibilität dieser Beschleuniger zur Implementierung von Sensoren zur Abschätzung der Versorgungsspannung ausgenutzt werden kann. Zudem können durch eine spezielle Art der Aktivierung von großen Mengen an Logik Berechnungen in anderen Schaltungen für Fault Angriffe gestört werden. Diese Bedrohung wird hier beispielsweise durch die Erweiterung bestehender Angriffe weiter analysiert und es werden Strategien zur Absicherung dagegen entwickelt

    WAKE WORD DETECTION AND ITS APPLICATIONS

    Get PDF
    Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input. Novel methods are proposed to train a wake word detection system from partially labeled training data, and to use it in on-line applications. In the system, the prerequisite of frame-level alignment is removed, permitting the use of un-transcribed training examples that are annotated only for the presence/absence of the wake word. Also, an FST-based decoder is presented to perform online detection. The suite of methods greatly improve the wake word detection performance across several datasets. A novel neural network for acoustic modeling in wake word detection is also investigated. Specifically, the performance of several variants of chunk-wise streaming Transformers tailored for wake word detection is explored, including looking-ahead to the next chunk, gradient stopping, different positional embedding methods and adding same-layer dependency between chunks. Experiments demonstrate that the proposed Transformer model outperforms the baseline convolutional network significantly with a comparable model size, while still maintaining linear complexity w.r.t. the input length. For the application of the detected wake word in ASR, the problem of improving speech recognition with the help of the detected wake word is investigated. Voice-controlled house-hold devices face the difficulty of performing speech recognition of device-directed speech in the presence of interfering background speech. Two end-to-end models are proposed to tackle this problem with information extracted from the anchored segment. The anchored segment refers to the wake word segment of the audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise. A multi-task learning setup is also explored where the ideal mask, obtained from a data synthesis procedure, is used to guide the model training. In addition, a way to synthesize "noisy" speech from "clean" speech is also proposed to mitigate the mismatch between training and test data. The proposed methods show large word error reduction for Amazon Alexa live data with interfering background speech, without sacrificing the performance on clean speech

    Using spatiotemporal patterns to qualitatively represent and manage dynamic situations of interest : a cognitive and integrative approach

    Get PDF
    Les situations spatio-temporelles dynamiques sont des situations qui évoluent dans l’espace et dans le temps. L’être humain peut identifier des configurations de situations dans son environnement et les utilise pour prendre des décisions. Ces configurations de situations peuvent aussi être appelées « situations d’intérêt » ou encore « patrons spatio-temporels ». En informatique, les situations sont obtenues par des systèmes d’acquisition de données souvent présents dans diverses industries grâce aux récents développements technologiques et qui génèrent des bases de données de plus en plus volumineuses. On relève un problème important dans la littérature lié au fait que les formalismes de représentation utilisés sont souvent incapables de représenter des phénomènes spatiotemporels dynamiques et complexes qui reflètent la réalité. De plus, ils ne prennent pas en considération l’appréhension cognitive (modèle mental) que l’humain peut avoir de son environnement. Ces facteurs rendent difficile la mise en œuvre de tels modèles par des agents logiciels. Dans cette thèse, nous proposons un nouveau modèle de représentation des situations d’intérêt s’appuyant sur la notion des patrons spatiotemporels. Notre approche utilise les graphes conceptuels pour offrir un aspect qualitatif au modèle de représentation. Le modèle se base sur les notions d’événement et d’état pour représenter des phénomènes spatiotemporels dynamiques. Il intègre la notion de contexte pour permettre aux agents logiciels de raisonner avec les instances de patrons détectés. Nous proposons aussi un outil de génération automatisée des relations qualitatives de proximité spatiale en utilisant un classificateur flou. Finalement, nous proposons une plateforme de gestion des patrons spatiotemporels pour faciliter l’intégration de notre modèle dans des applications industrielles réelles. Ainsi, les contributions principales de notre travail sont : Un formalisme de représentation qualitative des situations spatiotemporelles dynamiques en utilisant des graphes conceptuels. ; Une approche cognitive pour la définition des patrons spatio-temporels basée sur l’intégration de l’information contextuelle. ; Un outil de génération automatique des relations spatiales qualitatives de proximité basé sur les classificateurs neuronaux flous. ; Une plateforme de gestion et de détection des patrons spatiotemporels basée sur l’extension d’un moteur de traitement des événements complexes (Complex Event Processing).Dynamic spatiotemporal situations are situations that evolve in space and time. They are part of humans’ daily life. One can be interested in a configuration of situations occurred in the environment and can use it to make decisions. In the literature, such configurations are referred to as “situations of interests” or “spatiotemporal patterns”. In Computer Science, dynamic situations are generated by large scale data acquisition systems which are deployed everywhere thanks to recent technological advances. Spatiotemporal pattern representation is a research subject which gained a lot of attraction from two main research areas. In spatiotemporal analysis, various works extended query languages to represent patterns and to query them from voluminous databases. In Artificial Intelligence, predicate-based models represent spatiotemporal patterns and detect their instances using rule-based mechanisms. Both approaches suffer several shortcomings. For example, they do not allow for representing dynamic and complex spatiotemporal phenomena due to their limited expressiveness. Furthermore, they do not take into account the human’s mental model of the environment in their representation formalisms. This limits the potential of building agent-based solutions to reason about these patterns. In this thesis, we propose a novel approach to represent situations of interest using the concept of spatiotemporal patterns. We use Conceptual Graphs to offer a qualitative representation model of these patterns. Our model is based on the concepts of spatiotemporal events and states to represent dynamic spatiotemporal phenomena. It also incorporates contextual information in order to facilitate building the knowledge base of software agents. Besides, we propose an intelligent proximity tool based on a neuro-fuzzy classifier to support qualitative spatial relations in the pattern model. Finally, we propose a framework to manage spatiotemporal patterns in order to facilitate the integration of our pattern representation model to existing applications in the industry. The main contributions of this thesis are as follows: A qualitative approach to model dynamic spatiotemporal situations of interest using Conceptual Graphs. ; A cognitive approach to represent spatiotemporal patterns by integrating contextual information. ; An automated tool to generate qualitative spatial proximity relations based on a neuro-fuzzy classifier. ; A platform for detection and management of spatiotemporal patterns using an extension of a Complex Event Processing engine

    Biometric signals compression with time- and subject-adaptive dictionary for wearable devices

    Get PDF
    This thesis work is dedicated to the design of a lightweight compression technique for the real-time processing of biomedical signals in wearable devices. The proposed approach exploits the unsupervised learning algorithm of the time-adaptive self-organizing map (TASOM) to create a subject-adaptive codebook applied to the vector quantization of a signal. The codebook is obtained and then dynamically refined in an online fashion, without requiring any prior information on the signal itsel

    Stories within Immersive Virtual Environments

    Get PDF
    [eng] How can we use immersive and interactive technologies to portray stories?How can we take advantage of the fact that within immersive virtual en-vironments people tend to respond realistically to virtual situations andevents to develop narrative content? Stories in such a media would allowthe participant to contribute to the story and interact with the virtualcharacters while the narrative plot would not change, or change only upto how it was decided a priori. Participants in such a narrative would beable to freely interact within the virtual environments and yet still beaware of the main trust of the stories presented. How can we preserve the‘respond as if it is real’ phenomenon induced by these technologies, butalso develop an unfolding plot in this environment? In other words, canwe develop a story, conserving the structure, its psychological and cul-tural richness and the emotional and cognitive involvement it supposes,in an interactive and immersive audiovisual space?In recent years Virtual Reality therapy has shown that an Immersive Vir-tual Environment (IVE) with a predetermined plot can be experienced asan interactive narrative. For example, in the context of Post TraumaticStress Disorder treatment, the reactions of the participants and the thera-peutic impact suggest that an IVE is a qualitatively different experiencethan classical audiovisual content. However, the methods to develop suchkind of content are not systematic, and the consistency of the experienceis only granted by a therapist or operator controlling in real time theunfolding narrative. Can a story with a strong classical plot be renderedin an automated and interactive immersive virtual environment?..[cat] Podem emprar la realitat virtual immersiva per contar històries? Com po-dem aprofitar el fet que dins dels entorns virtuals immersius les personestendeixen a respondre de manera realista a les situacions i esdevenimentsvirtuals per desenvolupar històries? Els participants en aquest tipus denarrativa podrien interactuar lliurement amb els entorns virtuals i noobstant això experimentarien les històries presentades com a plausibles iconsistents. Una història en aquest medi audiovisual permetria als parti-cipants interactuar amb els personatges virtuals i contribuir activamentals esdeveniments escenificats en l’entorn virtual. Malgrat això, la tramaestablerta a priori no canviaria, o canviaria només dins els marges es-tablerts per l’autor. Com podem preservar el fet que hom tendeix a "re-spondre com si fos real" induït per aquestes tecnologies mentre desenvolu-pem una trama en aquests entorns? En altres paraules, podem desenvolu-par una història conservant-ne l’estructura, la riquesa cultural i psicolò-gica i la implicació emocional i cognitiva que suposa, en una realitatvirtual immersiva i interactiva?Recentment la teràpia de realitat virtual ha mostrat que un entorn vir-tual amb un guió preestablert pot ser percebut com una narració inter-activa. Per exemple, en el context del tractament de Trastorns per EstrèsPostraumàtic, les reaccions i impactes terapèutics suggereixen que pro-voca una sensació de realitat que en fa una experiència qualitativamentdiferent als continguts audiovisuals clàssics. No obstant això, la consistèn-cia de l’experiència tan sols pot ser garantida si un un terapeuta o op-erador controla en temps real el flux dels esdeveniments constituint elguió narratiu. Podem representar un guió clàssic en un entorn virtualautomatitzat?..

    The 2nd Conference of PhD Students in Computer Science

    Get PDF

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Get PDF
    Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines
    corecore