330 research outputs found
Polyphonic music information retrieval based on multi-label cascade classification system
Recognition and separation of sounds played by various instruments is very useful in labeling audio files with semantic information. This is a non-trivial task requiring sound analysis, but the results can aid automatic indexing and browsing music data when searching for melodies played by user specified instruments. Melody match based on pitch detection technology has drawn much attention and a lot of MIR systems have been developed to fulfill this task. However, musical instrument recognition remains an unsolved problem in the domain. Numerous approaches on acoustic feature extraction have already been proposed for timbre recognition. Unfortunately, none of those monophonic timbre estimation algorithms can be successfully applied to polyphonic sounds, which are the more usual cases in the real music world. This has stimulated the research on multi-labeled instrument classification and new features development for content-based automatic music information retrieval. The original audio signals are the large volume of unstructured sequential values, which are not suitable for traditional data mining algorithms; while the acoustical features are sometime not sufficient for instrument recognition in polyphonic sounds because they are higher-level representatives of raw signal lacking details of original information. In order to capture the patterns which evolve on the time scale, new temporal features are introduced to supply more temporal information for the timbre recognition. We will introduce the multi-labeled classification system to estimate multiple timbre information from the polyphonic sound by classification based on acoustic features and short-term power spectrum matching. In order to achieve higher estimation rate, we introduced the hierarchically structured cascade classification system under the inspiration of the human perceptual process. This cascade classification system makes a first estimate on the higher level decision attribute, which stands for the musical instrument family. Then, the further estimation is done within that specific family range. Experiments showed better performance of a hierarchical system than the traditional flat classification method which directly estimates the instrument without higher level of family information analysis.
Traditional hierarchical structures were constructed in human semantics, which are meaningful from human perspective but not appropriate for the cascade system. We introduce the new hierarchical instrument schema according to the clustering results of the acoustic features. This new schema better describes the similarity among different instruments or among different playing techniques of the same instrument. The classification results show the higher accuracy of cascade system with the new schema compared to the traditional schemas. The query answering system is built based on the cascade classifier
Music Genre Classification: A Semi-supervised Approach
Music genres can be seen as categorical descriptions used to classify music basing on various characteristics such as instrumentation, pitch, rhythmic structure, and harmonic contents. Automatic music genre classification is important for music retrieval in large music collections on the web. We build a classifier that learns from very few labeled examples plus a large quantity of unlabeled data, and show that our methodology outperforms existing supervised and unsupervised approaches. We also identify salient features useful for music genre classification. We achieve 97.1% accuracy of 10-way classification on real-world audio collections
Recommended from our members
Ability-Based Emotional Intelligence Is Associated With Greater Cardiac Vagal Control and Reactivity
Several distinct models of emotional intelligence (EI) have been developed over the past two decades. The ability model conceptualizes EI as a narrow set of interconnected, objectively measured, cognitive-emotional abilities, including the ability to perceive, manage, facilitate, and understand the emotions of the self and others. By contrast, trait or mixed models focus on subjective ratings of emotional/social competencies. Theoretically, EI is associated with neurobiological processes involved in emotional regulation and reactivity. The neurovisceral integration (NVI) model proposes a positive relationship between cardiac vagal control (CVC) and cognitive-emotional abilities similar to those encompassed by EI. The current study examined the association between CVC and EI. Because ability EI is directly tied to actual performance on emotional tasks, we hypothesized that individuals with higher ability-based EI scores would show greater levels of CVC at rest, and in response to a stressful task. Because mixedmodels of EI are not linked directly to observable emotional behavior, we predicted no association with CVC. Consistent with expectations, individuals with higher levels of ability EI, but not mixed EI, had higher levels of CVC. We also found that individuals with greater levels of CVC who demonstrated reactivity to a stress induction had significantly higher EI compared to individuals that did not respond to the stress induction. Our findings support the theoretically expected overlap between constructs within the NVI model and ability EI model, however, the observed effect size was small, and the associations between EI and CVC should not be taken to indicate a causal connection. Results suggest that variance in the ability to understand emotional processes in oneself and to reason about one's visceral experience may facilitate better CVC. Future work manipulating either CVC or EI may prove informative in teasing apart the causal role driving their observed relationship.Joint Warfighter Medical Research Program [W81XWH-16-1-0062]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Categorización de letras de canciones de un portal web usando agrupación
Algoritmos de clasificación y de agrupación han sido usados ampliamente en sistemas de recuperación de información musical (MIR) para organizar repositorios musicales en categorías o grupos relacionados, por ejemplo género, modo o tema, usando el sonido o sonido en combinación con la letra de la canción. Sin embargo, la investigación relacionada con agrupación usando solamente la letra de la canción es poca. El objetivo principal de este trabajo es definir un modelo no supervisado de minería de datos para la agrupación de letras de canciones recopiladas en un portal web, usando solamente características de la letra de la canción, con el fin de ofrecer mejores opciones de búsqueda a los usuarios del portal. El modelo propuesto primero identifica el lenguaje de las letras de canciones usando Naive Bayes y n-grams (para el caso de este trabajo se identificaron 30.000 letras de canciones en Español y 30.000 en Ingles). Luego las letras son representadas en un modelo de espacio vectorial Bag OfWords (BOW), usando características de Part Of Speech (POS) y transformando los datos al formato TF-IDF. Posteriormente, se estima el numero apropiado de agrupaciones (K) y se usan algoritmos particionales y jerárquicos con el _n de obtener los grupos diferenciados de letras de canciones. Para evaluar los resultados de cada agrupación se usan medidas como el índice Davies Bouldin (DBI) y medidas internas y externas de similaridad de los grupos. Finalmente, los grupos se etiquetan usando palabras frecuentes y reglas de asociación identificadas en cada grupo. Los experimentos realizados muestran que la música puede ser organizada en grupos relacionados como género, modo, sentimientos y temas, la cual puede ser etiquetada con técnicas no supervisadas usando solamente la información de la letra de la canción.Abstract. Classification and clustering algorithms have been applied widely in Music Information Retrieval (MIR) to organize music repositories in categories or clusters, like genre, mood or topic, using sound or sound with lyrics. However, clustering related research using lyrics information only is not much. The main goal of this work is to define an unsupervised text mining model for grouping lyrics compiled in a website, using lyrics features only, in order to offer better search options to the website users. The proposal model first performs a language identification for lyrics using Nafive Bayes and n-grams (for this work 30.000 lyrics in Spanish and 30.000 in English were identifed). Next lyrics are represented in a vector space model Bag Of Words (BOW), using Part Of Speech (POS) features and transforming data to TF-IDF format. Then, the appropriate number of clusters (K) is estimated and partitional and hierarchical methods are used to perform clustering. For evaluating the clustering results, some measures are used such as Davies Bouldin Index (DBI), intra similarity and inter similarity measures. At last, the final clusters are tagged using top words and association rules per group. Experiments show that music could be organized in related groups as genre, mood, sentiment and topic, and tagged with unsupervised techniques using only lyrics information.Maestrí
Gender is a multifaceted concept: evidence that specific life experiences differentially shape the concept of gender
Gender has been the focus of linguistic and psychological studies, but little is known about its conceptual representation. We investigate whether the conceptual structure of gender—as expressed in participants’ free-listing responses—varies according to gender-related experiences in line with research on conceptual flexibility. Specifically, we tested groups that varied by gender identity, sexual orientation, and gender-normativity. We found that different people stressed distinct aspect of the concept. For example, normative individuals mainly relied on a bigenderist conception (e.g., male/female; man/woman), while non-normative individuals produced more aspects related to social context (e.g., queer, fluidity, construction). At a broader level, our results support the idea that gender is a multifaceted and flexible concept, constituted by social, biological, cultural, and linguistic components. Importantly, the meaning of gender is not exhausted by the classical dichotomy opposing sex, a biological fact, with gender as its cultural counterpart. Instead, both aspects are differentially salient depending on specific life experiences
Recommended from our members
Consensus clustering framework for analysing fMRI datasets.
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonNeuroimaging of humans has gained a position of status within neuroscience. Modern functional
magnetic resonance imaging (fMRI) technique provides neuroscientists with a powerful tool to
depict the complex architecture of human brains. fMRI generates large amount of data and many
analysis methods have been proposed to extract useful information from the data. Clustering
technique has been one of the most popular data-driven techniques to study brain functional connectivity,
which excels when traditional model-based approaches are difficult to implement. However,
the reliability and consistency of many findings are jeopardised by too many analysis methods,
parameters, and sometimes too few samples used. In this thesis, a consensus clustering
analysis framework for analysing fMRI data has been developed, aiming at overcoming the clustering
algorithm selection problem as well as reliability issues in neuroimaging. The framework is
able to identify groups of voxels representing brain regions that consistently exhibiting correlated
BOLD activities across many experimental conditions by integrating clustering results from multiple
clustering algorithms and various parameters such as the number of clusters . In the framework,
the individual clustering result generation is aided by high performance grid computing technique
to reduce the overall computational time. The integration of clustering results is implemented
by a technique named binarisation of consensus partition matrix (Bi-CoPaM) adapted and
enhanced for fMRI data analysis. The whole framework has been validated and is robust to participants’
individual variability, yielding most complete and reproducible clusters compared to the
traditional single clustering approach. This framework has been applied to two real fMRI studies
that investigate brain responses to listening to the emotional music with different preferences. In
the first fMRI study, three brain structures related to visual, reward, and auditory processing are
found to have intrinsic temporal patterns of coherent neuroactivity during affective processing,
which is one of the few data-driven studies that have observed. In the second study, different
levels of engagement, i.e. intentional to unintentional, with music have unique effects on the auditory-
limbic connectivity when listening to music, which has not been investigated and understood well in euro science of music field. We believe the work in this thesis has demonstrated an effective and competent approach to address the reliability and consistency concerns in fMRI data analysis
Experiencing musical beauty: emotional subtypes and their physiological and musico-acoustic correlates
A listener’s aesthetic engagement with a musical piece often reaches peaks in response to passages experienced as especially beautiful. The present study examined the extent to which responses to such self-identified beautiful passages (BPs), in self-selected music, may be distinguishable in terms of their affective qualities. In an online survey, participants indicated pieces in which they considered specific passages to be outstandingly beautiful. In the lab, they listened to these pieces while physiological recordings were taken. Afterwards, they provided ratings on their experience of the BPs, where items targeted emotion response, underlying engagement mechanisms, and aesthetic evaluation. Cluster-analyses based on emotion ratings suggested three BP subtypes that we labelled low-Tension-low-Energy (LTLE), low-Tension-high-Energy (LTHE) and high-Tension-high-Energy (HTHE) BPs. LTHE and HTHE BPs induced greater interest and were more liked than LTLE BPs. Further, LTHE and HTHE clusters were associated with increases in skin-conductance, in accordance with the higher arousal reported for these BPs, while LTLE BPs resulted in the increases in smiling and respiration-rate previously associated with processing fluency and positive valence. LTLE BPs were also shown to be lower in tempo and polyphony than the other BP types. Finally, while both HTHE and LTHE BPs were associated with changes in dynamics, they nevertheless also showed distinct patterns whereby HTHE BPs were associated with increases in pitch register and LTHE BPs, with reductions in harmonic ambiguity. Thus, in line with our assumption that there is more than one kind of experience of musical beauty, our study reveals three distinct subtypes, distinguishable on a range of facets
Speech data analysis for semantic indexing of video of simulated medical crises.
The Simulation for Pediatric Assessment, Resuscitation, and Communication (SPARC) group within the Department of Pediatrics at the University of Louisville, was established to enhance the care of children by using simulation based educational methodologies to improve patient safety and strengthen clinician-patient interactions. After each simulation session, the physician must manually review and annotate the recordings and then debrief the trainees. The physician responsible for the simulation has recorded 100s of videos, and is seeking solutions that can automate the process. This dissertation introduces our developed system for efficient segmentation and semantic indexing of videos of medical simulations using machine learning methods. It provides the physician with automated tools to review important sections of the simulation by identifying who spoke, when and what was his/her emotion. Only audio information is extracted and analyzed because the quality of the image recording is low and the visual environment is static for most parts. Our proposed system includes four main components: preprocessing, speaker segmentation, speaker identification, and emotion recognition. The preprocessing consists of first extracting the audio component from the video recording. Then, extracting various low-level audio features to detect and remove silence segments. We investigate and compare two different approaches for this task. The first one is threshold-based and the second one is classification-based. The second main component of the proposed system consists of detecting speaker changing points for the purpose of segmenting the audio stream. We propose two fusion methods for this task. The speaker identification and emotion recognition components of our system are designed to provide users the capability to browse the video and retrieve shots that identify ”who spoke, when, and the speaker’s emotion” for further analysis. For this component, we propose two feature representation methods that map audio segments of arbitary length to a feature vector with fixed dimensions. The first one is based on soft bag-of-word (BoW) feature representations. In particular, we define three types of BoW that are based on crisp, fuzzy, and possibilistic voting. The second feature representation is a generalization of the BoW and is based on Fisher Vector (FV). FV uses the Fisher Kernel principle and combines the benefits of generative and discriminative approaches. The proposed feature representations are used within two learning frameworks. The first one is supervised learning and assumes that a large collection of labeled training data is available. Within this framework, we use standard classifiers including K-nearest neighbor (K-NN), support vector machine (SVM), and Naive Bayes. The second framework is based on semi-supervised learning where only a limited amount of labeled training samples are available. We use an approach that is based on label propagation. Our proposed algorithms were evaluated using 15 medical simulation sessions. The results were analyzed and compared to those obtained using state-of-the-art algorithms. We show that our proposed speech segmentation fusion algorithms and feature mappings outperform existing methods. We also integrated all proposed algorithms and developed a GUI prototype system for subjective evaluation. This prototype processes medical simulation video and provides the user with a visual summary of the different speech segments. It also allows the user to browse videos and retrieve scenes that provide answers to semantic queries such as: who spoke and when; who interrupted who? and what was the emotion of the speaker? The GUI prototype can also provide summary statistics of each simulation video. Examples include: for how long did each person spoke? What is the longest uninterrupted speech segment? Is there an unusual large number of pauses within the speech segment of a given speaker
- …