Search CORE

21 research outputs found

AMR Compressed-Domain Analysis for Multimedia Forensics Double Compression Detection

Author: José Fabrizio Pereira Sampaio
Publication venue: ANP Editora
Publication date: 01/06/2019
Field of study

An audio recording must be authentic to be admitted as evidence in a criminal prosecution so that the speech is saved with maximum fidelity and interpretation mistakes are prevented. AMR (adaptive multi-rate) encoder is a worldwide standard for speech compression and for GSM mobile network transmission, including 3G and 4G. In addition, such encoder is an audio file format standard with extension AMR which uses the same compression algorithm. Due to its extensive usage in mobile networks and high availability in modern smartphones, AMR format has been found in audio authenticity cases for forgery searching. Such exams compound the multimedia forensics field which consists of, among other techniques, double compression detection, i. e., to determine if a given AMR file was decompressed and compressed again. AMR double compression detection is a complex engineering problem whose solution is still underway. In general terms, if an AMR file is double compressed, it is not an original one and it was likely doctored. The published works in literature about double compression detection are based on decoded waveform AMR files to extract features. In this paper, a new approach is proposed to AMR double compression detection which, in spite of processing decoded audio, uses its encoded version to extract compressed-domain linear prediction (LP) coefficient-based features. By means of feature statistical analysis, it is possible to show that they can be used to achieve AMR double compression detection in an effective way, so that they can be considered a promising path to solve AMR double compression problem by artificial neural networks

Directory of Open Access Journals

Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition

Author: Kilgour Kevin
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network

KITopen

Selecting and Generating Computational Meaning Representations for Short Texts

Author: Finegan-Dollak Catherine
Publication venue
Publication date: 01/01/2018
Field of study

Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd

Deep Blue Documents at the University of Michigan

Machine Learning for Beamforming in Audio, Ultrasound, and Radar

Author: Nair Arun Asokan
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/09/2021
Field of study

Multi-sensor signal processing plays a crucial role in the working of several everyday technologies, from correctly understanding speech on smart home devices to ensuring aircraft fly safely. A specific type of multi-sensor signal processing called beamforming forms a central part of this thesis. Beamforming works by combining the information from several spatially distributed sensors to directionally filter information, boosting the signal from a certain direction but suppressing others. The idea of beamforming is key to the domains of audio, ultrasound, and radar. Machine learning is the other central part of this thesis. Machine learning, and especially its sub-field of deep learning, has enabled breakneck progress in tackling several problems that were previously thought intractable. Today, machine learning powers many of the cutting edge systems we see on the internet for image classification, speech recognition, language translation, and more. In this dissertation, we look at beamforming pipelines in audio, ultrasound, and radar from a machine learning lens and endeavor to improve different parts of the pipelines using ideas from machine learning. We start off in the audio domain and derive a machine learning inspired beamformer to tackle the problem of ensuring the audio captured by a camera matches its visual content, a problem we term audiovisual zooming. Staying in the audio domain, we then demonstrate how deep learning can be used to improve the perceptual qualities of speech by denoising speech clipping, codec distortions, and gaps in speech. Transitioning to the ultrasound domain, we improve the performance of short-lag spatial coherence ultrasound imaging by exploiting the differences in tissue texture at each short lag value by applying robust principal component analysis. Next, we use deep learning as an alternative to beamforming in ultrasound and improve the information extraction pipeline by simultaneously generating both a segmentation map and B-mode image of high quality directly from raw received ultrasound data. Finally, we move to the radar domain and study how deep learning can be used to improve signal quality in ultra-wideband synthetic aperture radar by suppressing radio frequency interference, random spectral gaps, and contiguous block spectral gaps. By training and applying the networks on raw single-aperture data prior to beamforming, it can work with myriad sensor geometries and different beamforming equations, a crucial requirement in synthetic aperture radar

Johns Hopkins University

JScholarship

Compressive Sensing with Low-Power Transfer and Accurate Reconstruction of EEG Signals

Author: Dey M.
Dey M.
Publication venue: University of East London
Publication date: 01/01/2019
Field of study

Tele-monitoring of EEG in WBAN is essential as EEG is the most powerful physiological parameters to diagnose any neurological disorder. Generally, EEG signal needs to record for longer periods which results in a large volume of data leading to huge storage and communication bandwidth requirements in WBAN. Moreover, WBAN sensor nodes are battery operated which consumes lots of energy. The aim of this research is, therefore, low power transmission of EEG signal over WBAN and its accurate reconstruction at the receiver to enable continuous online-monitoring of EEG and real time feedback to the patients from the medical experts. To reduce data rate and consequently reduce power consumption, compressive sensing (CS) may be employed prior to transmission. Nonetheless, for EEG signals, the accuracy of reconstruction of the signal with CS depends on a suitable dictionary in which the signal is sparse. As the EEG signal is not sparse in either time or frequency domain, identifying an appropriate dictionary is paramount. There are a plethora of choices for the dictionary to be used. Wavelet bases are of interest due to the availability of associated systems and methods. However, the attributes of wavelet bases that can lead to good quality of reconstruction are not well understood. For the first time in this study, it is demonstrated that in selecting wavelet dictionaries, the incoherence with the sensing matrix and the number of vanishing moments of the dictionary should be considered at the same time. In this research, a framework is proposed for the selection of an appropriate wavelet dictionary for EEG signal which is used in tandem with sparse binary matrix (SBM) as the sensing matrix and ST-SBL method as the reconstruction algorithm. Beylkin (highly incoherent with SBM and relatively high number of vanishing moments) is identified as the best dictionary to be used amongst the dictionaries are evaluated in this thesis. The power requirements for the proposed framework are also quantified using a power model. The outcomes will assist to realize the computational complexity and online implementation requirements of CS for transmitting EEG in WBAN. The proposed approach facilitates the energy savings budget well into the microwatts range, ensuring a significant savings of battery life and overall system’s power. The study is intended to create a strong base for the use of EEG in the high-accuracy and low-power based biomedical applications in WBAN

UEL Research Repository at University of East London

The 1st Advanced Manufacturing Student Conference (AMSC21) Chemnitz, Germany 15–16 July 2021

Author: Amodeo Giuseppe
Dix Martin
Götze Uwe
Krumm Dominik
Malani Chintan
Odenwald Stephan
Publication venue: Universitätsverlag Chemnitz
Publication date: 30/03/2022
Field of study

The Advanced Manufacturing Student Conference (AMSC) represents an educational format designed to foster the acquisition and application of skills related to Research Methods in Engineering Sciences. Participating students are required to write and submit a conference paper and are given the opportunity to present their findings at the conference. The AMSC provides a tremendous opportunity for participants to practice critical skills associated with scientific publication. Conference Proceedings of the conference will benefit readers by providing updates on critical topics and recent progress in the advanced manufacturing engineering and technologies and, at the same time, will aid the transfer of valuable knowledge to the next generation of academics and practitioners. *** The first AMSC Conference Proceeding (AMSC21) addressed the following topics: Advances in “classical” Manufacturing Technologies, Technology and Application of Additive Manufacturing, Digitalization of Industrial Production (Industry 4.0), Advances in the field of Cyber-Physical Systems, Virtual and Augmented Reality Technologies throughout the entire product Life Cycle, Human-machine-environment interaction and Management and life cycle assessment.:- Advances in “classical” Manufacturing Technologies - Technology and Application of Additive Manufacturing - Digitalization of Industrial Production (Industry 4.0) - Advances in the field of Cyber-Physical Systems - Virtual and Augmented Reality Technologies throughout the entire product Life Cycle - Human-machine-environment interaction - Management and life cycle assessmen

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Multimedia ONline ARchiv CHemnitz

Bag-of-words representations for computer audition

Author: Schmitt Maximilian
Publication venue
Publication date: 20/05/2022
Field of study

Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

OPUS Augsburg

Representation Learning for Natural Language Processing

Author: Lin Yankai
Liu Zhiyuan
Sun Maosong
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book provides an overview of the recent advances in representation learning theory, algorithms and applications for natural language processing (NLP). It is divided into three parts. Part I presents the representation learning techniques for multiple language entries, including words, phrases, sentences and documents. Part II then introduces the representation techniques for those objects that are closely related to NLP, including entity-based world knowledge, sememe-based linguistic knowledge, networks, and cross-modal entries. Lastly, Part III provides open resource tools for representation learning techniques, and discusses the remaining challenges and future research directions. The theories and algorithms of representation learning presented can also benefit other related domains such as machine learning, social network analysis, semantic Web, information retrieval, data mining and computational biology. This book is intended for advanced undergraduate and graduate students, post-doctoral fellows, researchers, lecturers, and industrial engineers, as well as anyone interested in representation learning and natural language processing

OAPEN Library

Advanced Operation and Maintenance in Solar Plants, Wind Farms and Microgrids

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

This reprint presents advances in operation and maintenance in solar plants, wind farms and microgrids. This compendium of scientific articles will help clarify the current advances in this subject, so it is expected that it will please the reader

Directory of Open Access Books (DOAB)