794 research outputs found

    Audio Event Classification Using Deep Learning Methods

    Get PDF
    Whether crossing the road or enjoying a concert, sound carries important information about the world around us. Audio event classification refers to recognition tasks involving the assignment of one or several labels, such as ‘dog bark’ or ‘doorbell’, to a particular audio signal. Thus, teaching machines to conduct this classification task can help humans in many fields. Since deep learning has shown its great potential and usefulness in many AI applications, this thesis focuses on studying deep learning methods and building suitable neural networks for this audio event classification task. In order to evaluate the performance of different neural networks, we tested them on both Google AudioSet and the dataset for DCASE 2018 Task 2. Instead of providing original audio files, AudioSet offers compact 128-dimensional embeddings outputted by a modified VGG model for audio with a frame length of 960ms. For DCASE 2018 Task 2, we firstly preprocessed the soundtracks and then fine-tuned the VGG model that AudioSet used as a feature extractor. Thus, each soundtrack from both tasks is represented as a series of 128-dimensional features. We then compared the DNN, LSTM, and multi-level attention models with different hyper parameters. The results show that fine-tuning the feature generation model for the DCASE task greatly improved the evaluation score. In addition, the attention models were found to perform the best in our settings for both tasks. The results indicate that utilizing a CNN-like model as a feature extractor for the log-mel spectrograms and modeling the dynamics information using an attention model can achieve state-of-the-art results in the task of audio event classification. For future research, the thesis suggests training a better CNN model for feature extraction, utilizing multi-scale and multi-level features for better classification, and combining the audio features with other multimodal information for audiovisual data analysis

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Development of a practical and mobile brain-computer communication device for profoundly paralyzed individuals

    Full text link
    Thesis (Ph.D.)--Boston UniversityBrain-computer interface (BCI) technology has seen tremendous growth over the past several decades, with numerous groundbreaking research studies demonstrating technical viability (Sellers et al., 2010; Silvoni et al., 2011). Despite this progress, BCIs have remained primarily in controlled laboratory settings. This dissertation proffers a blueprint for translating research-grade BCI systems into real-world applications that are noninvasive and fully portable, and that employ intelligent user interfaces for communication. The proposed architecture is designed to be used by severely motor-impaired individuals, such as those with locked-in syndrome, while reducing the effort and cognitive load needed to communicate. Such a system requires the merging of two primary research fields: 1) electroencephalography (EEG)-based BCIs and 2) intelligent user interface design. The EEG-based BCI portion of this dissertation provides a history of the field, details of our software and hardware implementation, and results from an experimental study aimed at verifying the utility of a BCI based on the steady-state visual evoked potential (SSVEP), a robust brain response to visual stimulation at controlled frequencies. The visual stimulation, feature extraction, and classification algorithms for the BCI were specially designed to achieve successful real-time performance on a laptop computer. Also, the BCI was developed in Python, an open-source programming language that combines programming ease with effective handling of hardware and software requirements. The result of this work was The Unlock Project app software for BCI development. Using it, a four-choice SSVEP BCI setup was implemented and tested with five severely motor-impaired and fourteen control participants. The system showed a wide range of usability across participants, with classification rates ranging from 25-95%. The second portion of the dissertation discusses the viability of intelligent user interface design as a method for obtaining a more user-focused vocal output communication aid tailored to motor-impaired individuals. A proposed blueprint of this communication "app" was developed in this dissertation. It would make use of readily available laptop sensors to perform facial recognition, speech-to-text decoding, and geo-location. The ultimate goal is to couple sensor information with natural language processing to construct an intelligent user interface that shapes communication in a practical SSVEP-based BCI

    Modeling huge sound sources in a room acoustical calculation program

    Get PDF

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
    • …
    corecore