1,417 research outputs found

    Verifying tag annotation and performing genre classification in music data via association analysis

    Get PDF
    Music Information Retrieval aims to automate the access to large-volume music data, including browsing, retrieval, storage, etc. The work presented in this thesis tackles two non-trivial problems in the field. First problem deals with music tags, which provide descriptive and rich information about a music piece, including its genre, artist, emotion, instrument, etc. At present, tag annotation is largely a manual process, which often results in tags that are subjective, ambiguous, and error-prone. We propose a novel approach to verify the quality of tag annotation in a music dataset through association analysis. Second, we employ association analysis to predict music genres based on features extracted directly from music. We build an association-based classifier, which finds inherent associations between music features and genres. We demonstrate the effectiveness of our approaches through a series of simulations and experiments using various benchmark music datasets

    Vocal Detection: An evaluation between general versus focused models

    Get PDF
    This thesis focuses on presenting a technique on improving current vocal detection methods. One of the most popular methods employs some type of statistical approach where vocal signals can be distinguished automatically by first training a model on both vocal and non-vocal example data, then using this model to classify audio signals into vocals or non-vocals. There is one problem with this method which is that the model that has been trained is typically very general and does its best at classifying various different types of data. Since the audio signals containing vocals that we care about are songs, we propose to improve vocal detection accuracies by creating focused models targeted at predicting vocal segments according to song artist and artist gender. Such useful information like artist name are often overlooked, this restricts opportunities in processing songs more specific to its type and hinders its potential success. Experiment results with several models built according to artist and artist gender reveal improvements of up to 17% when compared to using the general approach. With such improvements, applications such as automatic lyric synchronization to vocal segments in real-time may become more achievable with greater accuracy

    Sentiment Classification Using Negation as a Proxy for Negative Sentiment

    Get PDF
    We explore the relationship between negated text and neg- ative sentiment in the task of sentiment classification. We propose a novel adjustment factor based on negation occur- rences as a proxy for negative sentiment that can be applied to lexicon-based classifiers equipped with a negation detec- tion pre-processing step. We performed an experiment on a multi-domain customer reviews dataset obtaining accuracy improvements over a baseline, and we further improved our results using out-of-domain data to calibrate the adjustment factor. We see future work possibilities in exploring nega- tion detection refinements, and expanding the experiment to a broader spectrum of opinionated discourse, beyond that of customer reviews

    The analysis of canonical and non-canonical questions in an English language podcast

    Get PDF
    This bachelor’s thesis studies direct questions in “Grammar Day,” an episode of a linguistics podcast Talk the Talk. The aim is to analyze the formulation and function of canonical and non-canonical direct questions in natural oral discourse. The approach used is similar to that of conversation analysis and, hence, the analysis does not proceed from any specific hypotheses. Instead, the thesis takes a data-driven approach. The thesis consists of five sections: the introduction, the section comprising the literature review, an analysis of direct questions in the annotated transcript of the podcast episode, the conclusion, and the list of references. The introduction highlights the importance of examining canonical and non-canonical questions as well as the reasons for choosing podcast as a source for compiling the corpus of this study.https://www.ester.ee/record=b5239088*es
    corecore