382 research outputs found
Phonetic Temporal Neural Model for Language Identification
Deep neural models, particularly the LSTM-RNN model, have shown great
potential for language identification (LID). However, the use of phonetic
information has been largely overlooked by most existing neural LID methods,
although this information has been used very successfully in conventional
phonetic LID systems. We present a phonetic temporal neural model for LID,
which is an LSTM-RNN LID system that accepts phonetic features produced by a
phone-discriminative DNN as the input, rather than raw acoustic features. This
new model is similar to traditional phonetic LID methods, but the phonetic
knowledge here is much richer: it is at the frame level and involves compacted
information of all phones. Our experiments conducted on the Babel database and
the AP16-OLR database demonstrate that the temporal phonetic neural approach is
very effective, and significantly outperforms existing acoustic neural models.
It also outperforms the conventional i-vector approach on short utterances and
in noisy conditions.Comment: Submitted to TASL
A Duality Theorem for Quantitative Semantics
AbstractThis paper mainly studies quantitative possibility theory in the framework of domain. Using Sugeno's integral and the notion of module a duality theorem is obtained between the extended possibilistic powerdomain over a continuous domain X and the extended fuzzy predicates on X. This duality provides a reassuring link between the spaces of quantitative meaning and the corresponding Scott-topological space
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Phone-aware Neural Language Identification
Pure acoustic neural models, particularly the LSTM-RNN model, have shown
great potential in language identification (LID). However, the phonetic
information has been largely overlooked by most of existing neural LID models,
although this information has been used in the conventional phonetic LID
systems with a great success. We present a phone-aware neural LID architecture,
which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR
system. By utilizing the phonetic knowledge, the LID performance can be
significantly improved. Interestingly, even if the test language is not
involved in the ASR training, the phonetic knowledge still presents a large
contribution. Our experiments conducted on four languages within the Babel
corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification
For practical automatic speaker verification (ASV) systems, replay attack
poses a true risk. By replaying a pre-recorded speech signal of the genuine
speaker, ASV systems tend to be easily fooled. An effective replay detection
method is therefore highly desirable. In this study, we investigate a major
difficulty in replay detection: the over-fitting problem caused by variability
factors in speech signal. An F-ratio probing tool is proposed and three
variability factors are investigated using this tool: speaker identity, speech
content and playback & recording device. The analysis shows that device is the
most influential factor that contributes the highest over-fitting risk. A
frequency warping approach is studied to alleviate the over-fitting problem, as
verified on the ASV-spoof 2017 database
Timed-pNets: A Communication Behavioural Semantic Model for Distributed Systems (extended version)
This paper presents an approach to build a communication behavioural semantic model for heterogeneous distributed systems that include synchronous and asynchronous communications. Since each node of such system has its own physical clock, it brings the challenges of correctly specifying the system's time constraints. Based on the logical clocks proposed by Lamport and CCSL proposed by Aoste team in INRIA as well as pNets from Oasis team in INRIA, we develop timed-pNets to model communication behaviour for distributed systems. Timed-pNets are tree style hierarchical structures. Each node is associated with a timed specification which consists of a set of logical clocks and some relations on clocks. The leaves are represented by timed-pLTSs and non-leaf nodes are represented by timed-pNets including some holes which are filled by leaves or non-leaf nodes. Both timed-pLTSs and timed-pNets node can be translated to timed specifications. All these notions and methods are illustrated on a simple use-case of car insertion from the area of Intelligent Transportation Systems (ITS) and then TimeSquare tool is used to simulate and check the validity of our model.Cet article présente une nouvelle approche pour définir un modéle sémantique comportemental pour des systémes distribués comportant des communications aussi bien synchrones qu'asynchrones. Chaque site dans ce genre de systéme ayant sa propre horloge, définir correctement les contraintes temporelles globales du systéme est un défi. Á partir des concepts d'horloges virtuelles de Lamport, du langage CCSL introduit par l'équipe AOSTE d'INRIA, et du modéle pNets de l'équipe OASIS, nous développons notre modéle Timed-pNets pour exprimer les comportements et la communication de ces systémes distribués. Les Timed-pNets sont des structures hiérarchiques arborescentes. Á chaque noeud est associée une {\sl spécification temporelle} composée d'un ensemble d'horloges et de relations entre ces horloges. Les noeuds feuilles sont representés par des Timed-pLTSs (systémes de transitions paramétrés temporisés), et les autres noeuds sont soit recursivement des Timed-pNets, soit des trous (Holes) destinés á être remplis ultérieurement par des Timed-pNets. Nous définissons des algorithmes permettant de synthétiser la spécification temporelle des Timed-pLTSs et des Timed-pNets. Toutes ces notions sont illustrées sur un exemple de conduite automatisée de véhicules, issue du monde des systémes de transport intelligents (ITS); finalement nous utilisons le logiciel TimeSquare pour simuler notre modéle et en vérifier la validit
Timed-pNets: a communication behavioural semantic model for distributed systems
International audienceThis paper presents an approach to build a communicationbehavioural semantic model for heterogeneousdistributed systems that include synchronous and asynchronouscommunications. Since each node of such systemhas its own physical clock, it brings the challenges of correctlyspecifying the system time constraints. Based on thelogical clocks proposed by Lamport, and CCSL proposed byAoste team in INRIA, as well as pNets from Oasis teamin INRIA, we develop timed-pNets to model communicationbehaviours for distributed systems. Timed-pNets are treestyle hierarchical structures. Each node is associated with atimed specification which consists of a set of logical clocksand some relations on clocks. The leaves are representedby timed-pLTSs. Non-leaf nodes (called timed-pNets nodes)are synchronisation devices that synchronize the behavioursof subnets (these subnets can be leaves or non-leaf nodes).Both timed-pLTSs and timed-pNets nodes can be translatedto timed specifications. All these notions and methods are illustratedon a simple use-case of car insertion from the areaof intelligent transportation systems (ITS). In the end theTimeSquare tool is used to simulate and check the validityof our model
Deep Speaker Feature Learning for Text-independent Speaker Verification
Recently deep neural networks (DNNs) have been used to learn speaker
features. However, the quality of the learned features is not sufficiently
good, so a complex back-end model, either neural or probabilistic, has to be
used to address the residual uncertainty when applied to speaker verification,
just as with raw features. This paper presents a convolutional time-delay deep
neural network structure (CT-DNN) for speaker feature learning. Our
experimental results on the Fisher database demonstrated that this CT-DNN can
produce high-quality speaker features: even with a single feature (0.3 seconds
including the context), the EER can be as low as 7.68%. This effectively
confirmed that the speaker trait is largely a deterministic short-time property
rather than a long-time distributional pattern, and therefore can be extracted
from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur
- …