1,785 research outputs found

    Restricted Boltzmann machine vectors for speaker clustering and tracking tasks in TV broadcast shows

    Get PDF
    (This article belongs to the Special Issue IberSPEECH 2018: Speech and Language Technologies for Iberian Languages)Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectivelyPeer ReviewedPostprint (published version

    Music in the international market : differences and distribution : the case of Italy and China

    Get PDF
    Historically there has been limited transmission of musical ideas between Italy and China. When music travels between cultures it is subject to change and transformation and this cultural exchange is the foundation for popular music as we know it today. Within this dissertation, we will firstly analyse what makes music enjoyable for people through an analysis of genre. Then, perform a comparative analysis of their respective regional music genres and analyse similarities between them. Through this we can understand the similarities between the two markets and understand possible modes of entry for Italian musicians into the Chinese market. The motivation for this analysis is to ascertain whether there is a space for Italian musicians to find an audience in China. By understand the similarities between the countries we can find elements within Italian musicians’ product that will reduce the amount of alienation within the Chinese market.Tradicionalmente, tem sido reduzida a transmissão de noções e conceitos de música entre a Itália e a China. Quando a música viaja entre culturas está sujeita a mudanças e transformações, sendo este intercâmbio cultural a base da música popular tal como a conhecemos hoje. Com esta dissertação, pretende-se, em primeiro lugar, analisar o que leva a música ter um efeito positivo nas pessoas, através de uma análise de género. De seguida, far-se-á uma análise comparativa entre os diferentes géneros musicais regionais, analisando as semelhanças entre aqueles. Com este estudo, será possível compreender as semelhanças entre os dois países, e perceber como é que a música italiana poderá entrar no contexto chinês. O objetivo desta análise é verificar se existe público na China para os músicos italianos. Ao compreender as semelhanças entre estes dois países, poder-se-á encontrar elementos no espectro musical italiano que contribua para reduzir uma elevada indiferença à música italiana no mercado chinês

    Affective Image Content Analysis: Two Decades Review and New Perspectives

    Get PDF
    Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM

    Affective image content analysis: two decades review and new perspectives

    Get PDF

    Data mining in manufacturing: a review based on the kind of knowledge

    Get PDF
    In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques

    Community in Chinese Street Music: Sound, Song and Social Life

    Get PDF
    Jiqing guangchang is a form of amateur music performance event in Wuhan, a major city in central China. Groups of singers take turns to perform well-known Chinese popular songs for a few hours each afternoon and evening in squares, on street corners, and in parks around the city. Audiences take an active part by offering performers cash tips. Certain discourses surrounding contemporary urban life have portrayed experiences with popular music in these modern city contexts as distant from communal meaning. My ethnography of these performances and their surrounding social worlds is geared towards assessing the significance of community here, while also contributing to an understanding of the notion in contemporary urban China. Musical activity in jiqing guangchang is mundane, mainstream and rarely inspires fervent commitment or responses from participants. I analyse material from its spatial and sonic, economic, performative and social sides to look beyond understandings of community that are based on ideologies of kinship and belonging. I develop the discussion towards community’s embodied and material-level foundations, manifest in the mutual orientation and coexistence strategies of participants, their modes of sociability, and the designation and sharing of social territories. Thus, various limitations in current discourses of music and community can be transcended, particularly those tied to binary understandings of community’s position in relation to society, individualism, and several other key concepts. I aim to highlight that in contemporary urban situations, music’s ability to engender collective meaning is not only tied to ritualised contexts or those where divisive identity issues are prominent. Instead, my analysis of jiqing guangchang brings to the fore underlying and everyday modes of collective engagement that may be of deep-seated significance in interpreting all kinds of musical contexts

    Organization and administration of an audio-visual teaching aids library at Fort Benton Montana

    Get PDF

    Efficient machine learning: models and accelerations

    Get PDF
    One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

    Kaili, the homeland of 100 festivals: Space, music, and sound in a small city

    Get PDF
    This thesis examines the production of social space in Kaili, a small city in southwest China, through its branding as “the homeland of one hundred festivals”, inhabitants’ conceptualizations of music, amateur music-making practices, and the construction of the built environment. Drawing on Henri Lefebvre's triad of social space as a basic framework, I explore the complexity of the city through multiple aspects of the relationship between space, music and sound: how the built environment of post-Mao China hinders and hides amateur music, even in a city branded as a place of authentic (yuanshengtai) ethnic folk music; how disparities between the branding and living of Kaili have produced a discourse whereby citizens relocate authentic musical practices to an imagined rural space outside the city; and how amateur musicians have constructed hierarchies of amateur musical space within the city. This thesis makes a distinctive contribution across a range of disciplinary and theoretical interests: Chinese studies, multi-disciplinary debates about Lefebvre’s spatial theory, and urban studies. For Chinese studies, it gives detailed scrutiny to Lefebvre’s spatial theory in considering the historical and recent formation of urban space in China, and in so doing goes beyond the truism that social space is socially produced. It intervenes in ongoing discussions about Lefebvrian theory outside the parameters of Chinese studies, by grounding what has been a predominantly abstract discussion in ethnographically and textually-based research. My discussion of city branding and everyday musical activity elaborates Lefebvre’s theory, both modifying and adding to his triad of perceived, conceived and lived space
    • …
    corecore