92 research outputs found

    Unsupervised modeling of latent topics and lexical units in speech audio

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 67-70).Zero-resource speech processing involves the automatic analysis of a collection of speech data in a completely unsupervised fashion without the benefit of any transcriptions or annotations of the data. In this thesis, we describe a zero-resource framework that automatically discovers important words, phrases and topical themes present in an audio corpus. This system employs a segmental dynamic time warping (S-DTW) algorithm for acoustic pattern discovery in conjunction with a probabilistic model which treats the topic and pseudo-word identity of each discovered pattern as hidden variables. By applying an Expectation-Maximization (EM) algorithm, our method estimates the latent probability distributions over the pseudo-words and topics associated with the discovered patterns. Using this information, we produce informative acoustic summaries of the dominant topical themes of the audio document collection.by David F. Harwath.S.M

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Contribution to the construction of fingerprinting and watermarking schemes to protect mobile agents and multimedia content

    Get PDF
    The main characteristic of fingerprinting codes is the need of high error-correction capacity due to the fact that they are designed to avoid collusion attacks which will damage many symbols from the codewords. Moreover, the use of fingerprinting schemes depends on the watermarking system that is used to embed the codeword into the content and how it honors the marking assumption. In this sense, even though fingerprinting codes were mainly used to protect multimedia content, using them on software protection systems seems an option to be considered. This thesis, studies how to use codes which have iterative-decoding algorithms, mainly turbo-codes, to solve the fingerprinting problem. Initially, it studies the effectiveness of current approaches based on concatenating tradicioanal fingerprinting schemes with convolutional codes and turbo-codes. It is shown that these kind of constructions ends up generating a high number of false positives. Even though this thesis contains some proposals to improve these schemes, the direct use of turbo-codes without using any concatenation with a fingerprinting code as inner code has also been considered. It is shown that the performance of turbo-codes using the appropiate constituent codes is a valid alternative for environments with hundreds of users and 2 or 3 traitors. As constituent codes, we have chosen low-rate convolutional codes with maximum free distance. As for how to use fingerprinting codes with watermarking schemes, we have studied the option of using watermarking systems based on informed coding and informed embedding. It has been discovered that, due to different encodings available for the same symbol, its applicability to embed fingerprints is very limited. On this sense, some modifications to these systems have been proposed in order to properly adapt them to fingerprinting applications. Moreover the behavior and impact over a video produced as a collusion of 2 users by the YouTube’s s ervice has been s tudied. We have also studied the optimal parameters for viable tracking of users who have used YouTube and conspired to redistribute copies generated by a collusion attack. Finally, we have studied how to implement fingerprinting schemes and software watermarking to fix the problem of malicious hosts on mobile agents platforms. In this regard, four different alternatives have been proposed to protect the agent depending on whether you want only detect the attack or avoid it in real time. Two of these proposals are focused on the protection of intrusion detection systems based on mobile agents. Moreover, each of these solutions has several implications in terms of infrastructure and complexity.Els codis fingerprinting es caracteritzen per proveir una alta capacitat correctora ja que han de fer front a atacs de confabulació que malmetran una part important dels símbols de la paraula codi. D'atra banda, la utilització de codis de fingerprinting en entorns reals està subjecta a que l'esquema de watermarking que gestiona la incrustació sigui respectuosa amb la marking assumption. De la mateixa manera, tot i que el fingerprinting neix de la protecció de contingut multimèdia, utilitzar-lo en la protecció de software comença a ser una aplicació a avaluar. En aquesta tesi s'ha estudiat com aplicar codis amb des codificació iterativa, concretament turbo-codis, al problema del rastreig de traïdors en el context del fingerprinting digital. Inicialment s'ha qüestionat l'eficàcia dels enfocaments actuals en la utilització de codis convolucionals i turbo-codis que plantegen concatenacions amb esquemes habituals de fingerprinting. S'ha demostrat que aquest tipus de concatenacions portaven, de forma implícita, a una elevada probabilitat d'inculpar un usuari innocent. Tot i que s'han proposat algunes millores sobre aquests esquemes , finalment s'ha plantejat l'ús de turbocodis directament, evitant així la concatenació amb altres esquemes de fingerprinting. S'ha demostrat que, si s'utilitzen els codis constituents apropiats, el rendiment del turbo-descodificador és suficient per a ser una alternativa aplicable en entorns amb varis centenars d'usuaris i 2 o 3 confabuladors . Com a codis constituents s'ha optat pels codis convolucionals de baix ràtio amb distància lliure màxima. Pel que fa a com utilitzar els codis de fingerprinting amb esquemes de watermarking, s'ha estudiat l'opció d'utilitzar sistemes de watermarking basats en la codificació i la incrustació informada. S'ha comprovat que, degut a la múltiple codificació del mateix símbol, la seva aplicabilitat per incrustar fingerprints és molt limitada. En aquest sentit s'ha plantejat algunes modificacions d'aquests sistemes per tal d'adaptar-los correctament a aplicacions de fingerprinting. D'altra banda s'ha avaluat el comportament i l'impacte que el servei de YouTube produeix sobre un vídeo amb un fingerprint incrustat. A més , s'ha estudiat els paràmetres òptims per a fer viable el rastreig d'usuaris que han confabulat i han utilitzat YouTube per a redistribuir la copia fruït de la seva confabulació. Finalment, s'ha estudiat com aplicar els esquemes de fingerprinting i watermarking de software per solucionar el problema de l'amfitrió maliciós en agents mòbils . En aquest sentit s'han proposat quatre alternatives diferents per a protegir l'agent en funció de si és vol només detectar l'atac o evitar-lo en temps real. Dues d'aquestes propostes es centren en la protecció de sistemes de detecció d'intrusions basats en agents mòbils. Cadascuna de les solucions té diverses implicacions a nivell d'infrastructura i de complexitat.Postprint (published version

    Applied High-Order Singular Value Decomposition for Weight Compression and Expansion in Deep Neural Networks

    Get PDF
    Complex deep learning objectives such as object detection and saliency, semantic segmentation, sequence-to-sequence translation, and others have given rise to training processes requiring increasing amounts of time and computational resources. Human-in-the-loop solutions have addressed this problem in several ways; one such pain point is model hyperparameter search. Common methods of parameter search have high time costs and require iterative training of several models. Several algorithms have been proposed to manipulate a neural network's architecture and alleviate this cost. However, these algorithms require tuning of parameters to achieve desired performance and provide little to no intuition as to how such a change may affect overall performance. In this thesis, I present EigenRemove and WeakExpansion for removal and addition of weights providing a human-in-the-loop solution to the architecture search problem in both classical feedforward and convolutional neural network layers. EigenRemove yields results comparable to or better than the more popular Minimum Weight Selection pruning strategy, producing final test accuracies increased by 2-3% at larger compressions on the VGG16 object detection network. WeakExpand is compared with a trivial Zero Weight Expansion approach, where new connections are assigned no weight. WeakExpand is shown to produce final test accuracies in VGG16 comparable to that of Zero Weight Expansion, while providing new trainable weights rather than the dead weights produced by Zero Weight Expansion. Finally, I propose heuristics outlining how a user may use WeakExpand and EigenRemove to have a desired effect based on the current state of their network's training

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Deep Neural Network Architectures for Large-scale, Robust and Small-Footprint Speaker and Language Recognition

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura : 27-04-2017Artificial neural networks are powerful learners of the information embedded in speech signals. They can provide compact, multi-level, nonlinear representations of temporal sequences and holistic optimization algorithms capable of surpassing former leading paradigms. Artificial neural networks are, therefore, a promising technology that can be used to enhance our ability to recognize speakers and languages–an ability increasingly in demand in the context of new, voice-enabled interfaces used today by millions of users. The aim of this thesis is to advance the state-of-the-art of language and speaker recognition through the formulation, implementation and empirical analysis of novel approaches for large-scale and portable speech interfaces. Its major contributions are: (1) novel, compact network architectures for language and speaker recognition, including a variety of network topologies based on fully-connected, recurrent, convolutional, and locally connected layers; (2) a bottleneck combination strategy for classical and neural network approaches for long speech sequences; (3) the architectural design of the first, public, multilingual, large vocabulary continuous speech recognition system; and (4) a novel, end-to-end optimization algorithm for text-dependent speaker recognition that is applicable to a range of verification tasks. Experimental results have demonstrated that artificial neural networks can substantially reduce the number of model parameters and surpass the performance of previous approaches to language and speaker recognition, particularly in the cases of long short-term memory recurrent networks (used to model the input speech signal), end-to-end optimization algorithms (used to predict languages or speakers), short testing utterances, and large training data collections.Las redes neuronales artificiales son sistemas de aprendizaje capaces de extraer la información embebida en las señales de voz. Son capaces de modelar de forma eficiente secuencias temporales complejas, con información no lineal y distribuida en distintos niveles semanticos, mediante el uso de algoritmos de optimización integral con la capacidad potencial de mejorar los sistemas aprendizaje automático existentes. Las redes neuronales artificiales son, pues, una tecnología prometedora para mejorar el reconocimiento automático de locutores e idiomas; siendo el reconocimiento de de locutores e idiomas, tareas con cada vez más demanda en los nuevos sistemas de control por voz, que ya utilizan millones de personas. Esta tesis tiene como objetivo la mejora del estado del arte de las tecnologías de reconocimiento de locutor y de idioma mediante la formulación, implementación y análisis empírico de nuevos enfoques basados en redes neuronales, aplicables a dispositivos portátiles y a su uso en gran escala. Las principales contribuciones de esta tesis incluyen la propuesta original de: (1) arquitecturas eficientes que hacen uso de capas neuronales densas, localmente densas, recurrentes y convolucionales; (2) una nueva estrategia de combinación de enfoques clásicos y enfoques basados en el uso de las denominadas redes de cuello de botella; (3) el diseño del primer sistema público de reconocimiento de voz, de vocabulario abierto y continuo, que es además multilingüe; y (4) la propuesta de un nuevo algoritmo de optimización integral para tareas de reconocimiento de locutor, aplicable también a otras tareas de verificación. Los resultados experimentales extraídos de esta tesis han demostrado que las redes neuronales artificiales son capaces de reducir el número de parámetros usados por los algoritmos de reconocimiento tradicionales, así como de mejorar el rendimiento de dichos sistemas de forma substancial. Dicha mejora relativa puede acentuarse a través del modelado de voz mediante redes recurrentes de memoria a largo plazo, el uso de algoritmos de optimización integral, el uso de locuciones de evaluation de corta duración y mediante la optimización del sistema con grandes cantidades de datos de entrenamiento

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore