88 research outputs found

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Hidden Markov models and neural networks for speech recognition

    Get PDF
    The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ..

    1994 Science Information Management and Data Compression Workshop

    Get PDF
    This document is the proceedings from the 'Science Information Management and Data Compression Workshop,' which was held on September 26-27, 1994, at the NASA Goddard Space Flight Center, Greenbelt, Maryland. The Workshop explored promising computational approaches for handling the collection, ingestion, archival and retrieval of large quantities of data in future Earth and space science missions. It consisted of eleven presentations covering a range of information management and data compression approaches that are being or have been integrated into actual or prototypical Earth or space science data information systems, or that hold promise for such an application. The workshop was organized by James C. Tilton and Robert F. Cromp of the NASA Goddard Space Flight Center

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    Mobile Real-Time Grasshopper Detection and Data Aggregation Framework

    Get PDF
    nsects of the family Orthoptera: Acrididae including grasshoppers and locust devastate crops and eco-systems around the globe. The effective control of these insects requires large numbers of trained extension agents who try to spot concentrations of the insects on the ground so that they can be destroyed before they take flight. This is a challenging and difficult task. No automatic detection system is yet available to increase scouting productivity, data scale and fidelity. Here we demonstrate MAESTRO, a novel grasshopper detection framework that deploys deep learning within RBG images to detect insects. MAeStRo uses a state-of-the-art two-stage training deep learning approach. the framework can be deployed not only on desktop computers but also on edge devices without internet connection such as smartphones. MAeStRo can gather data using cloud storage for further research and in-depth analysis. In addition, we provide a challenging new open dataset (GHCID) of highly variable grasshopper populations imaged in inner Mongolia. the detection performance of the stationary method and the mobile App are 78 and 49 percent respectively; the stationary method requires around 1000 ms to analyze a single image, whereas the mobile app uses only around 400 ms per image. The algorithms are purely data-driven and can be used for other detection tasks in agriculture (e.g. plant disease detection) and beyond. This system can play a crucial role in the collection and analysis of data to enable more effective control of this critical global pest

    Paper Session III-A - Data Compression and Error-Protection Coding

    Get PDF
    Data Compression and Error-protection coding are two of the now widely heard but not well understood terms associated with the Information Super Highway. But this was not always so. Their familiarity is a consequence of developments which were initiated nearly 50 years ago with the introduction of modern information theory by Claude Shannon. Both concepts and techniques which can dramatically improve the representation, storage and communication of digital data - the underlying component of modern information systems. Although often invisible to individual users, the commercial applications of compression and coding, which affect Our daily lives now, have become extremely broad. Few of these applications can claim they were not directly or indirectly influenced by prior investments in this technology by NASA and the military. This paper describes important specific ongoing NASA direct technology transfers of data compression and error-protection coding techniques/technology. First jointly used to improve the return of Voyager images from Uranus and Neptune by a factor of 4, these techniques and their NASA sponsored custom high-speed microcircuits are now independently enjoying widespread use. A simplified laymen\u27s description of these techniques and their performance characteristics is followed by a status on their technology transfer

    Data compression techniques applied to high resolution high frame rate video technology

    Get PDF
    An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

    Harnessing customizationinWeb Annotation: ASoftwareProduct Line approach

    Get PDF
    222 p.La anotación web ayuda a mediar la interacción de lectura y escritura al transmitir información, agregar comentarios e inspirar conversaciones en documentos web. Se utiliza en áreas de Ciencias Sociales y Humanidades, Investigación Periodística, Ciencias Biológicas o Educación, por mencionar algunas. Las actividades de anotación son heterogéneas, donde los usuarios finales (estudiantes, periodistas, conservadores de datos, investigadores, etc.) tienen requisitos muy diferentes para crear, modificar y reutilizar anotaciones. Esto resulta en una gran cantidad de herramientas de anotación web y diferentes formas de representar y almacenar anotaciones web. Para facilitar la reutilización y la interoperabilidad, se han realizado varios intentos durante las últimas décadas para estandarizar las anotaciones web (por ejemplo, Annotea u Open Annotation), lo que ha dado como resultado las recomendaciones de anotaciones del W3C publicadas en 2017. Las recomendaciones del W3C proporcionan un marco para la representación de anotaciones (modelo de datos y vocabulario) y transporte (protocolo). Sin embargo, todavía hay una brecha en cómo se desarrollan los clientes de anotación (herramientas e interfaces de usuario), lo que hace que los desarrolladores vuelvan a re-implementar funcionalidades comunes (esdecir, resaltar, comentar, almacenar,¿) para crear su herramienta de anotación personalizada.Esta tesis tiene como objetivo proporcionar una plataforma de reutilización para el desarrollo de herramientas de anotación web para la revisión. Con este fin, hemos desarrollado una línea de productos de software llamada WACline. WACline es una familia de productos de anotación que permite a los desarrolladores crear extensiones de navegador de anotación web personalizadas, lo que facilita la reutilización de los activos principales y su adaptación a su contexto de revisión específico. Se ha creado siguiendo un proceso de acumulación de conocimientos en el que cada producto de anotación aprende de los productos de anotación creados previamente. Finalmente, llegamos a una familia de clientes de anotación que brinda soporte para tres prácticas de revisión: extracción de datos de revisión sistemática de literatura (Highlight&Go), revisión de tareas de estudiantes en educación superior (Mark&Go), y revisión por pares de conferencias y revistas (Review&Go). Para cada uno de los contextos de revisión, se ha llevado a cabo una evaluación con partes interesadas reales para validar las mejoras de eficiencia y eficacia aportadas por las herramientas de anotación personalizadas en su práctica

    Multi-image classification and compression using vector quantization

    Get PDF
    Vector Quantization (VQ) is an image processing technique based on statistical clustering, and designed originally for image compression. In this dissertation, several methods for multi-image classification and compression based on a VQ design are presented. It is demonstrated that VQ can perform joint multi-image classification and compression by associating a class identifier with each multi-spectral signature codevector. We extend the Weighted Bayes Risk VQ (WBRVQ) method, previously used for single-component images, that explicitly incorporates a Bayes risk component into the distortion measure used in the VQ quantizer design and thereby permits a flexible trade-off between classification and compression priorities. In the specific case of multi-spectral images, we investigate the application of the Multi-scale Retinex algorithm as a preprocessing stage, before classification and compression, that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The goals of this research are four-fold: (1) to study the interrelationship between statistical clustering, classification and compression in a multi-image VQ context; (2) to study mixed-pixel classification and combined classification and compression for simulated and actual, multispectral and hyperspectral multi-images; (3) to study the effects of multi-image enhancement on class spectral signatures; and (4) to study the preservation of scientific data integrity as a function of compression. In this research, a key issue is not just the subjective quality of the resulting images after classification and compression but also the effect of multi-image dimensionality on the complexity of the optimal coder design
    corecore