1,785 research outputs found
Restricted Boltzmann machine vectors for speaker clustering and tracking tasks in TV broadcast shows
(This article belongs to the Special Issue IberSPEECH 2018: Speech and Language Technologies for Iberian Languages)Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectivelyPeer ReviewedPostprint (published version
Music in the international market : differences and distribution : the case of Italy and China
Historically there has been limited transmission of musical ideas between Italy and
China. When music travels between cultures it is subject to change and transformation
and this cultural exchange is the foundation for popular music as we know it today. Within
this dissertation, we will firstly analyse what makes music enjoyable for people through
an analysis of genre. Then, perform a comparative analysis of their respective regional
music genres and analyse similarities between them. Through this we can understand the
similarities between the two markets and understand possible modes of entry for Italian
musicians into the Chinese market. The motivation for this analysis is to ascertain whether
there is a space for Italian musicians to find an audience in China. By understand the
similarities between the countries we can find elements within Italian musicians’ product
that will reduce the amount of alienation within the Chinese market.Tradicionalmente, tem sido reduzida a transmissão de noções e conceitos de música
entre a Itália e a China. Quando a música viaja entre culturas está sujeita a mudanças e
transformações, sendo este intercâmbio cultural a base da música popular tal como a
conhecemos hoje. Com esta dissertação, pretende-se, em primeiro lugar, analisar o que
leva a música ter um efeito positivo nas pessoas, através de uma análise de género. De
seguida, far-se-á uma análise comparativa entre os diferentes géneros musicais regionais,
analisando as semelhanças entre aqueles. Com este estudo, será possĂvel compreender as
semelhanças entre os dois paĂses, e perceber como Ă© que a mĂşsica italiana poderá entrar
no contexto chinês. O objetivo desta análise é verificar se existe público na China para os
mĂşsicos italianos. Ao compreender as semelhanças entre estes dois paĂses, poder-se-á
encontrar elementos no espectro musical italiano que contribua para reduzir uma elevada
indiferença à música italiana no mercado chinês
Affective Image Content Analysis: Two Decades Review and New Perspectives
Images can convey rich semantics and induce various emotions in viewers.
Recently, with the rapid advancement of emotional intelligence and the
explosive growth of visual data, extensive research efforts have been dedicated
to affective image content analysis (AICA). In this survey, we will
comprehensively review the development of AICA in the recent two decades,
especially focusing on the state-of-the-art methods with respect to three main
challenges -- the affective gap, perception subjectivity, and label noise and
absence. We begin with an introduction to the key emotion representation models
that have been widely employed in AICA and description of available datasets
for performing evaluation with quantitative comparison of label noise and
dataset bias. We then summarize and compare the representative approaches on
(1) emotion feature extraction, including both handcrafted and deep features,
(2) learning methods on dominant emotion recognition, personalized emotion
prediction, emotion distribution learning, and learning from noisy data or few
labels, and (3) AICA based applications. Finally, we discuss some challenges
and promising research directions in the future, such as image content and
context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM
Data mining in manufacturing: a review based on the kind of knowledge
In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques
Community in Chinese Street Music: Sound, Song and Social Life
Jiqing guangchang is a form of amateur music performance event in Wuhan, a major city in central China. Groups of singers take turns to perform well-known Chinese popular songs for a few hours each afternoon and evening in squares, on street corners, and in parks around the city. Audiences take an active part by offering performers cash tips. Certain discourses surrounding contemporary urban life have portrayed experiences with popular music in these modern city contexts as distant from communal meaning. My ethnography of these performances and their surrounding social worlds is geared towards assessing the significance of community here, while also contributing to an understanding of the notion in contemporary urban China.
Musical activity in jiqing guangchang is mundane, mainstream and rarely inspires fervent commitment or responses from participants. I analyse material from its spatial and sonic, economic, performative and social sides to look beyond understandings of community that are based on ideologies of kinship and belonging. I develop the discussion towards community’s embodied and material-level foundations, manifest in the mutual orientation and coexistence strategies of participants, their modes of sociability, and the designation and sharing of social territories. Thus, various limitations in current discourses of music and community can be transcended, particularly those tied to binary understandings of community’s position in relation to society, individualism, and several other key concepts. I aim to highlight that in contemporary urban situations, music’s ability to engender collective meaning is not only tied to ritualised contexts or those where divisive identity issues are prominent. Instead, my analysis of jiqing guangchang brings to the fore underlying and everyday modes of collective engagement that may be of deep-seated significance in interpreting all kinds of musical contexts
Efficient machine learning: models and accelerations
One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems.
To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance.
As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption.
Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression
Kaili, the homeland of 100 festivals: Space, music, and sound in a small city
This thesis examines the production of social space in Kaili, a small city in southwest China, through its branding as “the homeland of one hundred festivals”, inhabitants’ conceptualizations of music, amateur music-making practices, and the construction of the built environment. Drawing on Henri Lefebvre's triad of social space as a basic framework, I explore the complexity of the city through multiple aspects of the relationship between space, music and sound: how the built environment of post-Mao China hinders and hides amateur music, even in a city branded as a place of authentic (yuanshengtai) ethnic folk music; how disparities between the branding and living of Kaili have produced a discourse whereby citizens relocate authentic musical practices to an imagined rural space outside the city; and how amateur musicians have constructed hierarchies of amateur musical space within the city.
This thesis makes a distinctive contribution across a range of disciplinary and theoretical interests: Chinese studies, multi-disciplinary debates about Lefebvre’s spatial theory, and urban studies. For Chinese studies, it gives detailed scrutiny to Lefebvre’s spatial theory in considering the historical and recent formation of urban space in China, and in so doing goes beyond the truism that social space is socially produced. It intervenes in ongoing discussions about Lefebvrian theory outside the
parameters of Chinese studies, by grounding what has been a predominantly abstract discussion in ethnographically and textually-based research. My discussion of city branding and everyday musical activity elaborates Lefebvre’s theory, both modifying and adding to his triad of perceived, conceived and lived space
- …