3 research outputs found

    A Bayesian approach to simultaneously characterize the stochastic and deterministic components of a system

    Get PDF
    The present work provides a Bayesian approach to learn plausible models capable of characterizing complex time series in which deterministic and stochastic phenomena concur. Two main approaches are actually developed. The first approach, is a simple superposition model grounded on the hypothesis that the interactions between the stochastic and deterministic phenomena are negligible. To enable this model to capture complex dynamics, the stochastic part is assumed to be a fractal signal. Under the assumptions of this model, an analysis method is proposed, enabling the characterization of the fractal stochastic component and the estimation the deterministic part. The second main approach relies on Stochastic Differential Equations (SDEs) to model systems where the stochastic and deterministic part interact. First, a non-parametric estimation method for SDEs is developed, using recent advances from Gaussian processes. Finally, the thesis studies how to overcome the main constraint that the use of SDEs imposes: the Markovianity assumption. To that end, a new structured variational autoencoder with latent SDE dynamics is proposed. All the methods are tested on both synthetic and real signals, demonstrating its ability to capture the behavior of complex systems

    Voice Modeling Methods for Automatic Speaker Recognition

    Get PDF
    Building a voice model means to capture the characteristics of a speaker´s voice in a data structure. This data structure is then used by a computer for further processing, such as comparison with other voices. Voice modeling is a vital step in the process of automatic speaker recognition that itself is the foundation of several applied technologies: (a) biometric authentication, (b) speech recognition and (c) multimedia indexing. Several challenges arise in the context of automatic speaker recognition. First, there is the problem of data shortage, i.e., the unavailability of sufficiently long utterances for speaker recognition. It stems from the fact that the speech signal conveys different aspects of the sound in a single, one-dimensional time series: linguistic (what is said?), prosodic (how is it said?), individual (who said it?), locational (where is the speaker?) and emotional features of the speech sound itself (to name a few) are contained in the speech signal, as well as acoustic background information. To analyze a specific aspect of the sound regardless of the other aspects, analysis methods have to be applied to a specific time scale (length) of the signal in which this aspect stands out of the rest. For example, linguistic information (i.e., which phone or syllable has been uttered?) is found in very short time spans of only milliseconds of length. On the contrary, speakerspecific information emerges the better the longer the analyzed sound is. Long utterances, however, are not always available for analysis. Second, the speech signal is easily corrupted by background sound sources (noise, such as music or sound effects). Their characteristics tend to dominate a voice model, if present, such that model comparison might then be mainly due to background features instead of speaker characteristics. Current automatic speaker recognition works well under relatively constrained circumstances, such as studio recordings, or when prior knowledge on the number and identity of occurring speakers is available. Under more adverse conditions, such as in feature films or amateur material on the web, the achieved speaker recognition scores drop below a rate that is acceptable for an end user or for further processing. For example, the typical speaker turn duration of only one second and the sound effect background in cinematic movies render most current automatic analysis techniques useless. In this thesis, methods for voice modeling that are robust with respect to short utterances and background noise are presented. The aim is to facilitate movie analysis with respect to occurring speakers. Therefore, algorithmic improvements are suggested that (a) improve the modeling of very short utterances, (b) facilitate voice model building even in the case of severe background noise and (c) allow for efficient voice model comparison to support the indexing of large multimedia archives. The proposed methods improve the state of the art in terms of recognition rate and computational efficiency. Going beyond selective algorithmic improvements, subsequent chapters also investigate the question of what is lacking in principle in current voice modeling methods. By reporting on a study with human probands, it is shown that the exclusion of time coherence information from a voice model induces an artificial upper bound on the recognition accuracy of automatic analysis methods. A proof-of-concept implementation confirms the usefulness of exploiting this kind of information by halving the error rate. This result questions the general speaker modeling paradigm of the last two decades and presents a promising new way. The approach taken to arrive at the previous results is based on a novel methodology of algorithm design and development called “eidetic design". It uses a human-in-the-loop technique that analyses existing algorithms in terms of their abstract intermediate results. The aim is to detect flaws or failures in them intuitively and to suggest solutions. The intermediate results often consist of large matrices of numbers whose meaning is not clear to a human observer. Therefore, the core of the approach is to transform them to a suitable domain of perception (such as, e.g., the auditory domain of speech sounds in case of speech feature vectors) where their content, meaning and flaws are intuitively clear to the human designer. This methodology is formalized, and the corresponding workflow is explicated by several use cases. Finally, the use of the proposed methods in video analysis and retrieval are presented. This shows the applicability of the developed methods and the companying software library sclib by means of improved results using a multimodal analysis approach. The sclib´s source code is available to the public upon request to the author. A summary of the contributions together with an outlook to short- and long-term future work concludes this thesis

    Volume II: Mining Innovation

    Get PDF
    Contemporary exploitation of natural raw materials by borehole, opencast, underground, seabed, and anthropogenic deposits is closely related to, among others, geomechanics, automation, computer science, and numerical methods. More and more often, individual fields of science coexist and complement each other, contributing to lowering exploitation costs, increasing production, and reduction of the time needed to prepare and exploit the deposit. The continuous development of national economies is related to the increasing demand for energy, metal, rock, and chemical resources. Very often, exploitation is carried out in complex geological and mining conditions, which are accompanied by natural hazards such as rock bursts, methane, coal dust explosion, spontaneous combustion, water, gas, and temperature. In order to conduct a safe and economically justified operation, modern construction materials are being used more and more often in mining to support excavations, both under static and dynamic loads. The individual production stages are supported by specialized computer programs for cutting the deposit as well as for modeling the behavior of the rock mass after excavation in it. Currently, the automation and monitoring of the mining works play a very important role, which will significantly contribute to the improvement of safety conditions. In this Special Issue of Energies, we focus on innovative laboratory, numerical, and industrial research that has a positive impact on the development of safety and exploitation in mining
    corecore