303 research outputs found

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Microphone smart device fingerprinting from video recordings

    Get PDF
    This report aims at summarizing the on-going research activity carried out by DG-JRC in the framework of the institutional project Authors and Victims Identification of Child Abuse on-line, concerning the use of microphone fingerprinting for source device classification. Starting from an exhaustive study of the State of Art regarding the matter, this report describes a feasibility study about the adoption of microphone fingerprinting for source identification of video recordings. A set of operational scenarios have been established in collaboration with EUROPOL law enforcers, according to investigators needs. A critical analysis of the obtained results has demonstrated the feasibility of microphone fingerprinting and it has suggested a set of recommendations, both in terms of usability and future researches in the field.JRC.E.3-Cyber and Digital Citizens' Securit

    ROBUST SPEAKER RECOGNITION BASED ON LATENT VARIABLE MODELS

    Get PDF
    Automatic speaker recognition in uncontrolled environments is a very challenging task due to channel distortions, additive noise and reverberation. To address these issues, this thesis studies probabilistic latent variable models of short-term spectral information that leverage large amounts of data to achieve robustness in challenging conditions. Current speaker recognition systems represent an entire speech utterance as a single point in a high-dimensional space. This representation is known as "supervector". This thesis starts by analyzing the properties of this representation. A novel visualization procedure of supervectors is presented by which qualitative insight about the information being captured is obtained. We then propose the use of an overcomplete dictionary to explicitly decompose a supervector into a speaker-specific component and an undesired variability component. An algorithm to learn the dictionary from a large collection of data is discussed and analyzed. A subset of the entries of the dictionary is learned to represent speaker-specific information and another subset to represent distortions. After encoding the supervector as a linear combination of the dictionary entries, the undesired variability is removed by discarding the contribution of the distortion components. This paradigm is closely related to the previously proposed paradigm of Joint Factor Analysis modeling of supervectors. We establish a connection between the two approaches and show how our proposed method provides improvements in terms of computation and recognition accuracy. An alternative way to handle undesired variability in supervector representations is to first project them into a lower dimensional space and then to model them in the reduced subspace. This low-dimensional projection is known as "i-vector". Unfortunately, i-vectors exhibit non-Gaussian behavior, and direct statistical modeling requires the use of heavy-tailed distributions for optimal performance. These approaches lack closed-form solutions, and therefore are hard to analyze. Moreover, they do not scale well to large datasets. Instead of directly modeling i-vectors, we propose to first apply a non-linear transformation and then use a linear-Gaussian model. We present two alternative transformations and show experimentally that the transformed i-vectors can be optimally modeled by a simple linear-Gaussian model (factor analysis). We evaluate our method on a benchmark dataset with a large amount of channel variability and show that the results compare favorably against the competitors. Also, our approach has closed-form solutions and scales gracefully to large datasets. Finally, a multi-classifier architecture trained on a multicondition fashion is proposed to address the problem of speaker recognition in the presence of additive noise. A large number of experiments are conducted to analyze the proposed architecture and to obtain guidelines for optimal performance in noisy environments. Overall, it is shown that multicondition training of multi-classifier architectures not only produces great robustness in the anticipated conditions, but also generalizes well to unseen conditions

    A knowledge acquisition tool to assist case authoring from texts.

    Get PDF
    Case-Based Reasoning (CBR) is a technique in Artificial Intelligence where a new problem is solved by making use of the solution to a similar past problem situation. People naturally solve problems in this way, without even thinking about it. For example, an occupational therapist (OT) that assesses the needs of a new disabled person may be reminded of a previous person in terms of their disabilities. He may or may not decide to recommend the same devices based on the outcome of an earlier (disabled) person. Case-based reasoning makes use of a collection of past problem-solving experiences thus enabling users to exploit the information of others successes and failures to solve their own problem(s). This project has developed a CBR tool to assist in matching SmartHouse technology to the needs of the elderly and people with disabilities. The tool makes suggestions of SmartHouse devices that could assist with given impairments. SmartHouse past problem-solving textual reports have been used to obtain knowledge for the CBR system. Creating a case-based reasoning system from textual sources is challenging because it requires that the text be interpreted in a meaningful way in order to create cases that are effective in problem-solving and to be able to reasonably interpret queries. Effective case retrieval and query interpretation is only possible if a domain-specific conceptual model is available and if the different meanings that a word can take can be recognised in the text. Approaches based on methods in information retrieval require large amounts of data and typically result in knowledge-poor representations. The costs become prohibitive if an expert is engaged to manually craft cases or hand tag documents for learning. Furthermore, hierarchically structured case representations are preferred to flat-structured ones for problem-solving because they allow for comparison at different levels of specificity thus resulting in more effective retrieval than flat structured cases. This project has developed SmartCAT-T, a tool that creates knowledge-rich hierarchically structured cases from semi-structured textual reports. SmartCAT-T highlights important phrases in the textual SmartHouse problem-solving reports and uses the phrases to create a conceptual model of the domain. The model then becomes a standard structure onto which each semi-structured SmartHouse report is mapped in order to obtain the correspondingly structured case. SmartCAT-T also relies on an unsupervised methodology that recognises word synonyms in text. The methodology is used to create a uniform vocabulary for the textual reports and the resulting harmonised text is used to create the standard conceptual model of the domain. The technique is also employed in query interpretation during problem solving. SmartCAT-T does not require large sets of tagged data for learning, and the concepts in the conceptual model are interpretable, allowing for expert refinement of knowledge. Evaluation results show that the created cases contain knowledge that is useful for problem solving. An improvement in results is also observed when the text and queries are harmonised. A further evaluation highlights a high potential for the techniques developed in this research to be useful in domains other than SmartHouse. All this has been implemented in the Smarter case-based reasoning system

    Exploring ICMetrics to detect abnormal program behaviour on embedded devices

    Get PDF
    Execution of unknown or malicious software on an embedded system may trigger harmful system behaviour targeted at stealing sensitive data and/or causing damage to the system. It is thus considered a potential and significant threat to the security of embedded systems. Generally, the resource constrained nature of Commercial off-the-shelf (COTS) embedded devices, such as embedded medical equipment, does not allow computationally expensive protection solutions to be deployed on these devices, rendering them vulnerable. A Self-Organising Map (SOM) based and Fuzzy C-means based approaches are proposed in this paper for detecting abnormal program behaviour to boost embedded system security. The presented technique extracts features derived from processor's Program Counter (PC) and Cycles per Instruction (CPI), and then utilises the features to identify abnormal behaviour using the SOM. Results achieved in our experiment show that the proposed SOM based and Fuzzy C-means based methods can identify unknown program behaviours not included in the training set with 90.9% and 98.7% accuracy

    Semi-continuous hidden Markov models for automatic speaker verification

    Get PDF

    Audio Splicing Detection and Localization Based on Acquisition Device Traces

    Get PDF
    In recent years, the multimedia forensic community has put a great effort in developing solutions to assess the integrity and authenticity of multimedia objects, focusing especially on manipulations applied by means of advanced deep learning techniques. However, in addition to complex forgeries as the deepfakes, very simple yet effective manipulation techniques not involving any use of state-of-the-art editing tools still exist and prove dangerous. This is the case of audio splicing for speech signals, i.e., to concatenate and combine multiple speech segments obtained from different recordings of a person in order to cast a new fake speech. Indeed, by simply adding a few words to an existing speech we can completely alter its meaning. In this work, we address the overlooked problem of detection and localization of audio splicing from different models of acquisition devices. Our goal is to determine whether an audio track under analysis is pristine, or it has been manipulated by splicing one or multiple segments obtained from different device models. Moreover, if a recording is detected as spliced, we identify where the modification has been introduced in the temporal dimension. The proposed method is based on a Convolutional Neural Network (CNN) that extracts model-specific features from the audio recording. After extracting the features, we determine whether there has been a manipulation through a clustering algorithm. Finally, we identify the point where the modification has been introduced through a distance-measuring technique. The proposed method allows to detect and localize multiple splicing points within a recording
    • …
    corecore