Search CORE

382 research outputs found

Phonetic Temporal Neural Model for Language Identification

Author: Abel Andrew
Chen Yixiang
Li Lantian
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 25/08/2017
Field of study

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

arXiv.org e-Print Archive

Crossref

University of Strathclyde Institutional Repository

A Duality Theorem for Quantitative Semantics

Author: Chen Yixiang
Wu Hengyang
Publication venue: 'Elsevier BV'
Publication date: 03/12/2009
Field of study

AbstractThis paper mainly studies quantitative possibility theory in the framework of domain. Using Sugeno's integral and the notion of module a duality theorem is obtained between the extended possibilistic powerdomain over a continuous domain X and the extended fuzzy predicates on X. This duality provides a reassuring link between the spaces of quantitative meaning and the corresponding Scott-topological space

Elsevier - Publisher Connector

Exploring Communities in Large Profiled Graphs

Author: Chen Xiaojun
Chen Yankai
Cheng Reynold
Fang Yixiang
Li Yun
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Given a graph

G

and a vertex

q\in G

, the community search (CS) problem aims to efficiently find a subgraph of

G

whose vertices are closely related to

q

. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Phone-aware Neural Language Identification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 22/05/2017
Field of study

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315

arXiv.org e-Print Archive

Crossref

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification

Author: Chen Yixiang
Li Lantian
Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 07/06/2017
Field of study

For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database

arXiv.org e-Print Archive

Crossref

Timed-pNets: A Communication Behavioural Semantic Model for Distributed Systems (extended version)

Author: Chen Yanwen
Chen Yixiang
Madelaine Eric
Publication venue: HAL CCSD
Publication date: 07/05/2014
Field of study

This paper presents an approach to build a communication behavioural semantic model for heterogeneous distributed systems that include synchronous and asynchronous communications. Since each node of such system has its own physical clock, it brings the challenges of correctly specifying the system's time constraints. Based on the logical clocks proposed by Lamport and CCSL proposed by Aoste team in INRIA as well as pNets from Oasis team in INRIA, we develop timed-pNets to model communication behaviour for distributed systems. Timed-pNets are tree style hierarchical structures. Each node is associated with a timed specification which consists of a set of logical clocks and some relations on clocks. The leaves are represented by timed-pLTSs and non-leaf nodes are represented by timed-pNets including some holes which are filled by leaves or non-leaf nodes. Both timed-pLTSs and timed-pNets node can be translated to timed specifications. All these notions and methods are illustrated on a simple use-case of car insertion from the area of Intelligent Transportation Systems (ITS) and then TimeSquare tool is used to simulate and check the validity of our model.Cet article présente une nouvelle approche pour définir un modéle sémantique comportemental pour des systémes distribués comportant des communications aussi bien synchrones qu'asynchrones. Chaque site dans ce genre de systéme ayant sa propre horloge, définir correctement les contraintes temporelles globales du systéme est un défi. Á partir des concepts d'horloges virtuelles de Lamport, du langage CCSL introduit par l'équipe AOSTE d'INRIA, et du modéle pNets de l'équipe OASIS, nous développons notre modéle Timed-pNets pour exprimer les comportements et la communication de ces systémes distribués. Les Timed-pNets sont des structures hiérarchiques arborescentes. Á chaque noeud est associée une {\sl spécification temporelle} composée d'un ensemble d'horloges et de relations entre ces horloges. Les noeuds feuilles sont representés par des Timed-pLTSs (systémes de transitions paramétrés temporisés), et les autres noeuds sont soit recursivement des Timed-pNets, soit des trous (Holes) destinés á être remplis ultérieurement par des Timed-pNets. Nous définissons des algorithmes permettant de synthétiser la spécification temporelle des Timed-pLTSs et des Timed-pNets. Toutes ces notions sont illustrées sur un exemple de conduite automatisée de véhicules, issue du monde des systémes de transport intelligents (ITS); finalement nous utilisons le logiciel TimeSquare pour simuler notre modéle et en vérifier la validit

HAL-UNICE

INRIA a CCSD electronic archive server

Timed-pNets: a communication behavioural semantic model for distributed systems

Author: Chen Yanwen
Chen Yixiang
Madelaine Eric
Publication venue: Springer Verlag
Publication date: 14/11/2014
Field of study

International audienceThis paper presents an approach to build a communicationbehavioural semantic model for heterogeneousdistributed systems that include synchronous and asynchronouscommunications. Since each node of such systemhas its own physical clock, it brings the challenges of correctlyspecifying the system time constraints. Based on thelogical clocks proposed by Lamport, and CCSL proposed byAoste team in INRIA, as well as pNets from Oasis teamin INRIA, we develop timed-pNets to model communicationbehaviours for distributed systems. Timed-pNets are treestyle hierarchical structures. Each node is associated with atimed specification which consists of a set of logical clocksand some relations on clocks. The leaves are representedby timed-pLTSs. Non-leaf nodes (called timed-pNets nodes)are synchronisation devices that synchronize the behavioursof subnets (these subnets can be leaves or non-leaf nodes).Both timed-pLTSs and timed-pNets nodes can be translatedto timed specifications. All these notions and methods are illustratedon a simple use-case of car insertion from the areaof intelligent transportation systems (ITS). In the end theTimeSquare tool is used to simulate and check the validityof our model

Crossref

HAL-UNICE

INRIA a CCSD electronic archive server

Deep Speaker Feature Learning for Text-independent Speaker Verification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 10/05/2017
Field of study

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur

arXiv.org e-Print Archive

Crossref