Search CORE

26 research outputs found

A VOICE ACTIVITY DETECTION METHOD BASED ON GMM

Author: 林茜
蔡骏
陈奇川
Publication venue
Publication date: 01/01/2009
Field of study

为了提高声音活动检测的鲁棒性,提出了一种基于gMM模型的声音活动检测方法。此方法在频谱特征空间中建立背景噪音和语音的gMM模型,然后采用模型匹配的方法对被测信号进行区分。此方法自适应更新gMM模型的参数,使之可以适应环境的变化。实验结果显示该方法在噪音环境中比传统的声音活动检测方法具有更高的准确率。To improve the robustness of the voice activity detection(VAD),a GMM-based approach for VAD has been proposed in this paper.With this method,two GMMs are constructed to model the noise and the speech respectively in spectrum feature space,and the signal frames to be detected are discriminated in the way of GMMs matching.This method is designed to self-adapt the GMM parameters' updating to accommodate environmental variation.Experimental results show that the proposed method generally performs better in accuracy than traditional VAD approaches in noisy environments.福建省自然科学基金项目(2006J0043

Xiamen University Institutional Repository

A Real Time Speaker Recognition System Based on GMM

Author: 洪青阳
胡益平
蔡骏
Publication venue
Publication date: 17/06/2007
Field of study

介绍了一个基于GMM实时说话人识别系统的设计与实现,系统具有实时说话人辨认和实时说话人确认功能。在实验室条件下,对不同的高斯混合密度个数及采样率进行了测试,测试了模型的自适应性能。实验表明系统具有较好的识别准确率。The design and implementation of a real-time speaker recognition system which is based on GMM(Gaussian Mixture Model) are presented. The system has the characteristics of real time speaker identification and real time speaker verification. In the lab environment, the performance of the system, as well as the model adaptation, has been fully tested with GMMs of different numbers of Gaussian mixtures and different sampling rates. The testing results show that the GMM-based system has a satisfactory correctness in performing speaker recognition.厦门大学“985工程”二期“信息技术”创新平台项目资助,项目编号0000-X0720

Xiamen University Institutional Repository

A Parallel Algorithm for Fresnel Tomography

Author: 张建中　　　
杨国辉
林文
蔡骏
Publication venue: 计算机研究与发展
Publication date: 01/10/2007
Field of study

摘要:与射线层析成像相比, Fresnel 层析成像考虑波频率的影响, 具有较高的分辨率,但所需的存储空间和计算量更大,因此提出了Fresnel 层析成像的并行算法1 把大型层析反演方程组的求解,转化成对其中的各个方程进行相互独立的计算,避免了大型系数矩阵的存储问题;把一个Fresnel 带的正演和反演计算放在一个进程,不同Fresnel 带的计算相互独立进行,不需要信息传递,达到了极高的并行度; 从进程之间没有通信, 仅当从进程计算结束后,在主进程与各从进程之间有少量的数据传递,使通信开销达到了极小的程度1 应用MPI 在Linux PC 集群环境下实现了该算法,实际测试表明,该算法具有较高的并行度和加速比.Abstract 　In cont rast with ray2based t raveltime tomography , Fresnel tomography account s for the band2 limited nature of seismic waves and gives the higher resolution tomograms1 Because Fresnel tomography demands much computer memory and much running time , a parallel algorithm for it is proposed1 The tomographic inversion is t ransformed to resolving respectively a series of single equation in light of backprojection principle , each equation corresponding to a Fresnel zone1 The forward and inverse computation concerning a Fresnel zone is allocated to one process and is independent of other processes1 Then the storage and calculation of the large2scale mat rix in the tomography are avoided1 No message delivers between the slave processes , and only a little of data delivers between a master process and the slave ones1 By using the portable message passing interface standard (MPI ) for the communication , the computing code of the algorithm is implemented on Linux system , which allows to dist ribute the work on several PCs connected via standard Ethernet in an in2house network , and greatly expands the applicability of Fresnel tomography1 The test s on the synthetic and observed seismic t ravel time data show that this parallel algorithm has a good performance on Linux PCs1基金项目:国家自然科学基金项目(40774065) ;福建省自然科学基金项目(2006J0044

Xiamen University Institutional Repository

Key Technology Research for Speech Recognition

Author: 周昌乐
息晓静
林坤辉
蔡骏
Publication venue
Publication date: 11/04/2006
Field of study

采用隐马尔可夫模型(HMM)进行语音声学建模是大词汇连续语音识别取得突破性进展最主要的原因之一,HMM本身依赖的某些不合理建模假设和不具有区分性的训练算法正在成为制约语音识别系统未来发展的瓶颈。神经网络依靠权能够进行长时间记忆和知识存储,但对于输入模式的瞬时响应的记忆能力比较差。采用混合HMM/ANN模型对HMM的一些不尽合理的建模假设和训练算法进行了革新。混合模型用神经网络非参数概率模型代替高斯混合器(GM)计算HMM的状态所需要的观测概率。另外对神经网络的结构进行了优化,取得了很好的效果。Because of the application of the Hidden Markov Model(HMM) in acoustic modeling,a significant breakthrough has been made in recognizing continuous speech with a large glossary.However,some unreasonable hypotheses for acoustic modeling and the unclassified training algorithm on which the HMM based form a bottleneck,restricting the further improvement in speech recognition.The Artificial Neural Network(ANN) techniques can be adopted as an alternative modeling paradigm.By means of the weight values of the network connections,neural networks can steadily store the knowledge acquired from the training process.But they possess a weak memory,not being suitable to store the instantaneous response to various input modes.To overcome the flaws of the HMM paradigm,we design a hybrid HMM/ANN model.In this hybrid model,the nonparametric probabilistic model(a BP neural network) is used to substitute the Gauss blender to calculate the observed probability which is necessary for computing the states of the HMM model.Besides,we optimize the structure of the network,and experiments show that the hybrid model has a good performance in speech recognition.厦门大学985二期信息创新平台项目资

Xiamen University Institutional Repository

Expandsion of Adaptive Optics Simulation Modeling on SciSimu

Author: 林嘉文
蔡骏
谢晓钢
陶应学
Publication venue
Publication date: 01/01/2009
Field of study

为了提高自主开发的组件建模与仿真平台SCISIMu的自适应光学仿真和建模能力,将自适应光学系统码(CAOS)作为一个扩展添加到SCISIMu。该文提出了一个使用接口库、自动管理和内置编译器实现扩展的设计原理。经过大量仿真项目的计算测试,证明该设计原理能让SCISIMu具备自适应光学仿真能力,并且具有比CAOS更好的易用性和可扩展性。In order to improve adaptive optics simulation and modeling abilities to SciSimu which is developed independent, Code for Adaptive Optics System needs to be added to SciSimu as an expansion.The design principle of expandsion using interface library, automatic management and inside compiler is presented.A lot of simulation tests are implemented, which testifies the design principle can make SciSimu have ability of adaptive optics simulation, and be more user-friendly and expandable.国家“863”计划基金资助项

Xiamen University Institutional Repository

Method and implementation of transcribing speech corpora based on human-computation

Author: 刘勇进
史晓东
沈映泉
蔡骏
Publication venue
Publication date: 01/01/2009
Field of study

提出一种基于人类计算的语音语料库标注方法.该标注方法的主要思路是通过一个基于WEb的语言学习系统来收集由大量学习者(用户)输入的词汇标注和音标标注,并从中选择出现概率最大的用户输入作为语料的正确标注.为了保证通过这种人类计算方法获得的标注文本的质量,使用了一些计算机辅助机制来校验收集到的标注的可靠性.采用这种方法实现语音语料库标注的主要优点在于将语料库标注和语言学习相结合,无需专门投入大量的人力来进行枯燥乏味的语料库标注工作,从而节省了语料库标注的成本.对这种基于人类计算的语音语料库标注技术进行了探讨,说明了用于收集用户输入的语言学习系统的设计以及标注生成系统的设计.系统的应用表明,该标注方法能够有效、低成本地生成语音语料库的词汇标注和音标标注.A new method is proposed for generating transcriptions of speech corpora based on human-computation.The method depends on collection of orthographic transcriptions and phonetic transcriptions from a large number of users by using a Web-based language learning system and choosing commonly-used labels as the transcriptions of the speech corpora.In order to guarantee the quality of transcriptions,some computer-aided mechanisms are also used to verify the collected transcriptions.This method combines speech data transcribing with language learning and cuts down the cost of transcribing corpora effectively.The technology of human-computation-based speech corpora transcribing and the detailed design of language learning system have been discussed,transcriptions generation system has also been expatiated in this article.The application of system shows that this method is an effective and economical way to generate orthographic and phonetic transcriptions.国家留学基金资助项目(2006104705);福建省自然科学基金资助项目(2006J0043);厦门大学“985工程”二期信息创新平台资助项目(0000-X07204

Xiamen University Institutional Repository

定量磁化率成像重建方法及其应用

Author: 刘伟骏
林建忠
王阿莉
蔡聪波
陈忠
Publication venue: 波谱学杂志
Publication date: 01/01/2014
Field of study

磁共振成像(MRI)中,相位图像包含丰富的组织磁化率变化信息,获取相位图像不需要额外的扫描时间.组织中的顺磁性物质会影响组织磁化率差异,从而导致局部磁场不均匀.对组织内顺磁性物质的定量有利于许多脑血管疾病和神经系统疾病的诊断,但利用局部相位信息重建组织磁化率分布是一个不适定逆问题,目前仍然有许多问题亟待解决.该文着重介绍定量磁化率成像(QSM)的原理、重建方法及其在MRI中的应用.国家自然科学基金资助项目(81171331,11174239);中央高校基本业务费资助项目(2010121101

Xiamen University Institutional Repository

Fast Algorithm for Likelihood Ratio Based on SIMD

Author: 林茜
欧建林
蔡骏
Publication venue
Publication date: 01/01/2009
Field of study

分析基于连续概率密度的隐马尔可夫模型大词汇量连续语音识别系统中的似然率计算方法,阐述运用并行方式实现似然率计算的可行性,并在此基础上,提出一种基于SIMd的似然率快速算法,通过对语音识别工具包HTk3.4中似然率计算模块的改进实现该算法。实验结果表明,在不降低识别准确率的前提下,该算法能有效加快似然率计算的速度。The likelihood ratio computation in Large Vocabulary Continuous Speech Recognition(LVCSR) systems based on continuou density Hidden Markov Model(HMM) is analyzed.The feasibility of using t parallel method to implement the likelihood computation is showed.On basis of this, a fast algorithm for likelihood ratio based on SIMD is proposed, which is implemented by improving likelihood computation modules in HTK3.4 toolkit.Experimental results show this algorithm can speed up the likelihood computation without lowering the accuracy rate of recognition of premise.国家留学基金资助项目(2006104705);厦门大学“985工程”二期信息创新平台基金资助项目(0000-X07204);福建省自然科学基金资助项目(2006J0043

Xiamen University Institutional Repository

采埃孚9速自动变速器传动特性分析

Author: 王俊彦
范鑫
蔡骏宇
Publication venue: Editorial Office of Journal of Mechanical Transmission
Publication date: 01/01/2021
Field of study

以采埃孚9速自动变速器（9HP48）为研究对象，明确了自动变速器各挡位中动力传递的线路，对各挡位的传动比和转矩比（考虑摩擦损失）进行理论推导，并运用克莱伊涅斯公式计算了9个前进挡和1个倒挡的传动效率。为了验证理论推导的正确性，使用AMESim仿真软件建立自动变速器的仿真模型，通过仿真计算得到每一挡位传动比和传动效率数值，对比发现，仿真结果与理论分析结果相吻合。仿真结果验证了理论推导的正确性，也为优化自动变速器传动方案提供了基础

Directory of Open Access Journals

Multi-word Trigger Pair Language Model Using FP-tree

Author: 史晓东
蔡骏
许永林
Publication venue
Publication date: 30/12/2005
Field of study

在语音识别系统中,Trigger模型作为语言模型的一种,用于描述长距离词与词之间的关系,然而以往的Trigger语言模型多是针对单个词的模型,本文借鉴数据挖掘中关联规则发现的Apriori算法,利用效率比较高的FP树算法产生多词Trigger对,由此构造多词Trigger对语言模型,这种模型能够更多地利用历史数据,弥补了传统N元文法语言模型描述距离小于N的缺点.Trigger pair language model has been used to investigate long distance dependent relationship for speech recognition systems.Previous trigger pair model has only one word for its trigger.In this paper,a multiple words trigger pair model was created by using Apriori algorithm of mining association rules in a large database and more efficient FP-tree algorithm.This new model can know more about the history for a better prediction of the current word and can overcome the shortcoming of N-gram model that it only describe the word pairs being less than N words apart

Xiamen University Institutional Repository