    Robust Sound Event Classification using Deep Neural Networks

    The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques

    From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems

    The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.Comment: PhD thesi

    A hydrogen peroxide biosensor based on nanoparticle PANI/HRP electrode

    Recently, conducting polymers have attracted much interest in the development of biosensor. It contain π- electron backbone responsible for its unusual electronic properties such as electrical conductivity, low energy optical transitions, low ionization potential and high electron affinity. When the Horseradish peroxidase (HRP) was immobilized to the conducting polymers, these polymers possesses the ability to bind oppositely charged complex entities in their neutral insulating state. Determination of Hydrogen peroxide (H2O2) and other organic peroxides is of practical importance in clinical, environmental and many other fields. This study intends to see the role and properties of PANI/HRP layer towards H2O2 by measuring its current. Langmuir- Blodgett technique was used to form the PANI monolayer and the HRP was deposited in PANI monolayer by using electrodeposition method. Results from U.V.- visible spectrum of PANI with and without HRP shows two sharp absorption peaks at 320 nm and 720 nm. PANI forms as nanoparticles was revealed by VPSEM. AFM shows the image in roughness before and after the HRP was deposited on PANI monolayer. The current and response of H2O2 towards PANI/HRP electrode increases demonstrating effective electrocatalytic reduction of H202. PANI/HRP electrode not only act as excellent materials for rapid electron transfer but also for the fabrication of efficient biosensors

    A survey on context awareness in big data analytics for business applications

    The concept of context awareness has been in existence since the 1990s. Though initially applied exclusively in computer science, over time it has increasingly been adopted by many different application domains such as business, health and military. Contexts change continuously because of objective reasons, such as economic situation, political matter and social issues. The adoption of big data analytics by businesses is facilitating such change at an even faster rate in much complicated ways. The potential benefits of embedding contextual information into an application are already evidenced by the improved outcomes of the existing context-aware methods in those applications. Since big data is growing very rapidly, context awareness in big data analytics has become more important and timely because of its proven efficiency in big data understanding and preparation, contributing to extracting the more and accurate value of big data. Many surveys have been published on context-based methods such as context modelling and reasoning, workflow adaptations, computational intelligence techniques and mobile ubiquitous systems. However, to our knowledge, no survey of context-aware methods on big data analytics for business applications supported by enterprise level software has been published to date. To bridge this research gap, in this paper first, we present a definition of context, its modelling and evaluation techniques, and highlight the importance of contextual information for big data analytics. Second, the works in three key business application areas that are context-aware and/or exploit big data analytics have been thoroughly reviewed. Finally, the paper concludes by highlighting a number of contemporary research challenges, including issues concerning modelling, managing and applying business contexts to big data analytics. © 2020, Springer-Verlag London Ltd., part of Springer Nature

    Econometrics meets sentiment : an overview of methodology and applications

    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Algorithms and representations for supporting online music creation with large-scale audio databases

    The rapid adoption of Internet and web technologies has created an opportunity for making music collaboratively by sharing information online. However, current applications for online music making do not take advantage of the potential of shared information. The goal of this dissertation is to provide and evaluate algorithms and representations for interacting with large audio databases that facilitate music creation by online communities. This work has been developed in the context of Freesound, a large-scale, community-driven database of audio recordings shared under Creative Commons (CC) licenses. The diversity of sounds available through this kind of platform is unprecedented. At the same time, the unstructured nature of community-driven processes poses new challenges for indexing and retrieving information to support musical creativity. In this dissertation we propose and evaluate algorithms and representations for dealing with the main elements required by online music making applications based on large-scale audio databases: sound files, including time-varying and aggregate representations, taxonomies for retrieving sounds, music representations and community models. As a generic low-level representation for audio signals, we analyze the framework of cepstral coefficients, evaluating their performance with example classification tasks. We found that switching to more recent auditory filter such as gammatone filters improves, at large scales, on traditional representations based on the mel scale. We then consider common types of sounds for obtaining aggregated representations. We show that several time series analysis features computed from the cepstral coefficients complement traditional statistics for improved performance. For interacting with large databases of sounds, we propose a novel unsupervised algorithm that automatically generates taxonomical organizations based on the low-level signal representations. Based on user studies, we show that our approach can be used in place of traditional supervised classification approaches for providing a lexicon of acoustic categories suitable for creative applications. Next, a computational representation is described for music based on audio samples. We demonstrate through a user experiment that it facilitates collaborative creation and supports computational analysis using the lexicons generated by sound taxonomies. Finally, we deal with representation and analysis of user communities. We propose a method for measuring collective creativity in audio sharing. By analyzing the activity of the Freesound community over a period of more than 5 years, we show that the proposed creativity measures can be significantly related to social structure characterized by network analysis.La ràpida adopció dInternet i de les tecnologies web ha creat una oportunitat per fer música col•laborativa mitjançant l'intercanvi d'informació en línia. No obstant això, les aplicacions actuals per fer música en línia no aprofiten el potencial de la informació compartida. L'objectiu d'aquesta tesi és proporcionar i avaluar algorismes i representacions per a interactuar amb grans bases de dades d'àudio que facilitin la creació de música per part de comunitats virtuals. Aquest treball ha estat desenvolupat en el context de Freesound, una base de dades d'enregistraments sonors compartits sota llicència Creative Commons (CC) a gran escala, impulsada per la comunitat d'usuaris. La diversitat de sons disponibles a través d'aquest tipus de plataforma no té precedents. Alhora, la naturalesa desestructurada dels processos impulsats per comunitats planteja nous reptes per a la indexació i recuperació d'informació que dona suport a la creativitat musical. En aquesta tesi proposem i avaluem algorismes i representacions per tractar amb els principals elements requerits per les aplicacions de creació musical en línia basades en bases de dades d'àudio a gran escala: els arxius de so, incloent representacions temporals i agregades, taxonomies per a cercar sons, representacions musicals i models de comunitat. Com a representació de baix nivell genèrica per a senyals d'àudio, s'analitza el marc dels coeficients cepstrum, avaluant el seu rendiment en tasques de classificació d'exemple. Hem trobat que el canvi a un filtre auditiu més recent com els filtres de gammatons millora, a gran escala, respecte de les representacions tradicionals basades en l'escala mel. Després considerem tres tipus comuns de sons per a l'obtenció de representacions agregades. Es demostra que diverses funcions d'anàlisi de sèries temporals calculades a partir dels coeficients cepstrum complementen les estadístiques tradicionals per a un millor rendiment. Per interactuar amb grans bases de dades de sons, es proposa un nou algorisme no supervisat que genera automàticament organitzacions taxonòmiques basades en les representacions de senyal de baix nivell. Em base a estudis amb usuaris, mostrem que el sistema proposat es pot utilitzar en lloc dels sistemes tradicionals de classificació supervisada per proporcionar un lèxic de categories acústiques adequades per a aplicacions creatives. A continuació, es descriu una representació computacional per a música creada a partir de mostres d'àudio. Demostrem a través d'un experiment amb usuaris que facilita la creació col•laborativa i dóna suport l'anàlisi computacional usant els lèxics generats per les taxonomies de so. Finalment, ens centrem en la representació i anàlisi de comunitats d'usuaris. Proposem un mètode per mesurar la creativitat col•lectiva en l'intercanvi d'àudio. Mitjançant l'anàlisi de l'activitat de la comunitat Freesound durant un període de més de 5 anys, es mostra que les mesures proposades de creativitat es poden relacionar significativament amb l'estructura social descrita mitjançant l'anàlisi de xarxes.La rápida adopción de Internet y de las tecnologías web ha creado una oportunidad para hacer música colaborativa mediante el intercambio de información en línea. Sin embargo, las aplicaciones actuales para hacer música en línea no aprovechan el potencial de la información compartida. El objetivo de esta tesis es proporcionar y evaluar algoritmos y representaciones para interactuar con grandes bases de datos de audio que faciliten la creación de música por parte de comunidades virtuales. Este trabajo ha sido desarrollado en el contexto de Freesound, una base de datos de grabaciones sonoras compartidos bajo licencia Creative Commons (CC) a gran escala, impulsada por la comunidad de usuarios. La diversidad de sonidos disponibles a través de este tipo de plataforma no tiene precedentes. Al mismo tiempo, la naturaleza desestructurada de los procesos impulsados por comunidades plantea nuevos retos para la indexación y recuperación de información en apoyo de la creatividad musical. En esta tesis proponemos y evaluamos algoritmos y representaciones para tratar con los principales elementos requeridos por las aplicaciones de creación musical en línea basadas en bases de datos de audio a gran escala: archivos de sonido, incluyendo representaciones temporales y agregadas, taxonomías para buscar sonidos, representaciones musicales y modelos de comunidad. Como representación de bajo nivel genérica para señales de audio, se analiza el marco de los coeficientes cepstrum, evaluando su rendimiento en tareas de clasificación. Encontramos que el cambio a un filtro auditivo más reciente como los filtros de gammatonos mejora, a gran escala, respecto de las representaciones tradicionales basadas en la escala mel. Después consideramos tres tipos comunes de sonidos para la obtención de representaciones agregadas. Se demuestra que varias funciones de análisis de series temporales calculadas a partir de los coeficientes cepstrum complementan las estadísticas tradicionales para un mejor rendimiento. Para interactuar con grandes bases de datos de sonidos, se propone un nuevo algoritmo no supervisado que genera automáticamente organizaciones taxonómicas basadas en las representaciones de señal de bajo nivel. En base a estudios con usuarios, mostramos que nuestro enfoque se puede utilizar en lugar de los sistemas tradicionales de clasificación supervisada para proporcionar un léxico de categorías acústicas adecuadas para aplicaciones creativas. A continuación, se describe una representación computacional para música creada a partir de muestras de audio. Demostramos, a través de un experimento con usuarios, que facilita la creación colaborativa y posibilita el análisis computacional usando los léxicos generados por las taxonomías de sonido. Finalmente, nos centramos en la representación y análisis de comunidades de usuarios. Proponemos un método para medir la creatividad colectiva en el intercambio de audio. Mediante un análisis de la actividad de la comunidad Freesound durante un periodo de más de 5 años, se muestra que las medidas propuestas de creatividad se pueden relacionar significativamente con la estructura social descrita mediante análisis de redes

    Automatically generated summaries of sports videos based on semantic content

    The sport has been a part of our lives since the beginning of times, whether we are spectators or participants. The diffusion and increase of multimedia platforms made the consumption of these contents available to everyone. Sports videos appeal to a large population all around the world and have become an important form of multimedia content that is streamed over the Internet and television networks. Moreover, sport content creators want to provide the users with relevant information such as live commentary, summarization of the games in form of text or video using automatic tools.As a result, MOG-Technologies wants to create a tool capable of summarizing football matches based on semantic content, and this problem was explored in the scope of this Dissertation. The main objective is to convert the television football commentator's speech into text taking advantage of Google's Speech-to-Text tool. Several machine learning models were then tested to classify sentences into important events. For the model training, a dataset was created, combining 43 games transcription from different television channels also from 72 games provided by Google Search timeline commentary, the combined dataset contains 3260 sentences. To validate the proposed solution the accuracy and f1 score were extracted for each machine learning model.The results show that the developed tool is capable of predicting events in live events, with low error rate. Also, combining multiple sources, not only the sport commentator speech, will help to increase the performance of the tool. It is important to notice that the dataset created during this Dissertation will allow MOG-Technologies to expand and perfect the concept discussed in this project