13 research outputs found
Aligned Sub-Hierarchies: A Structure-Based Approach to the Cover Song Task
Extending previous structure-based approaches to the song comparison tasks such as the fingerprint and cover song tasks, this paper introduces the aligned sub-hierarchies (AsH) representation. Built by applying a post-processing technique to the aligned hierarchies of a song, the AsH representation is the set of unique aligned hierarchies for repeats (called AHR ) encoded in the original aligned hierarchies of the whole song. Effectively each AHR within AsH is a section of the aligned hierarchies for the original song. Like aligned hierarchies, the AsH representation can be embedded into a classification space with a natural metric that makes inter-song comparisons based on sections of the songs. Experiments addressing a version of the cover song task on score-based data using AsH as the basis of inter-song comparison demonstrate potential of AsH-based approaches for MIR tasks
Improving Structure Evaluation Through Automatic Hierarchy Expansion
Structural segmentation is the task of partitioning a recording into non-overlapping time intervals, and labeling each segment with an identifying marker such as A, B, or verse. Hierarchical structure annotation expands this idea to allow an annotator to segment a song with multiple levels of granularity. While there has been recent progress in developing evaluation criteria for comparing two hierarchical annotations of the same recording, the existing methods have known deficiencies when dealing with inexact label matchings and sequential label repetition. In this article, we investigate methods for automatically enhancing structural annotations by inferring (and expanding) hierarchical information from the segment labels. The proposed method complements existing techniques for comparing hierarchical structural annotations by coarsening or refining labels with variation markers to either collapse similarly labeled segments together, or separate identically labeled segments from each other. Using the multi-level structure annotations provided in the SALAMI dataset, we demonstrate that automatic hierarchy expansion allows structure comparison methods to more accurately assess similarity between annotations
SuPP & MaPP: Adaptable Structure-Based Representations For Mir Tasks
Accurate and flexible representations of music data are paramount to addressing MIR tasks, yet many of the existing approaches are difficult to interpret or rigid in nature. This work introduces two new song representations for structure-based retrieval methods: Surface Pattern Preservation (SuPP), a continuous song representation, and Matrix Pattern Preservation (MaPP), SuPP’s discrete counterpart. These representations come equipped with several user-defined parameters so that they are adaptable for a range of MIR tasks. Experimental results show MaPP as successful in addressing the cover song task on a set of Mazurka scores, with a mean precision of 0.965 and recall of 0.776. SuPP and MaPP also show promise in other MIR applications, such as novel-segment detection and genre classification, the latter of which demonstrates their suitability as inputs for machine learning problems
Integrating Data Science Ethics into an Undergraduate Major
We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for weaving ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated into existing courses at different levels of instruction with minimal disruption to syllabi. We conclude with next steps and preliminary assessments
Crowdsourcing Classroom Observations to Identify Misconceptions in Data Science
Web-browsing histories, online newspapers, streaming music, and stock prices all show that we live in an age of data. Extracting meaning from data is necessary in many fields to comprehend the information flow. This need has fueled rapid growth in data science education aiming to serve the next generation of policy makers, data science researchers, and global citizens. Initially, teaching practices have been drawn from data science\u27s parent disciplines (e.g., computer science and mathematics). This project addresses the early stages of developing a concept inventory of student difficulty within the newly emerging field of data science. In particular this project will address three primary research objectives: (1) identify student misconceptions in data science courses; (2) document students’ prior knowledge and identify courses that teach early data science concepts; and (3) confirm expert identification of data science concepts, and their importance for introductory-level data science curricula. During the first year of this grant, we have collected approximately 200 responses for a survey to confirm concepts from an existing body of knowledge presented by the Edison Project. Survey respondents are comprised of faculty and industry practitioners within data science and closely related fields. Preliminary analysis of these results will be presented with respect to our third research objective. In addition, we developed and launched a pilot assessment for identifying student difficulties within data science courses. The protocol includes regular responses to reflective questions by faculty, teaching assistants, and students from selected data science courses offered at the three participating institutions. Preliminary analyses will be presented along with implications for future data collection in year two of the project. In addition to the anticipated results, we expect that the data collection and analysis methodologies will be of interest to many scholars who have or will engage in discipline-based educational research
Evaluation of EDISON\u27s Data Science Competency Framework Through a Comparative Literature Analysis
During the emergence of Data Science as a distinct discipline, discussions of what exactly constitutes Data Science have been a source of contention, with no clear resolution. These disagreements have been exacerbated by the lack of a clear single disciplinary \u27parent.\u27 Many early efforts at defining curricula and courses exist, with the EDISON Project\u27s Data Science Framework (EDISON-DSF) from the European Union being the most complete. The EDISON-DSF includes both a Data Science Body of Knowledge (DS-BoK) and Competency Framework (CF-DS). This paper takes a critical look at how EDISON\u27s CF-DS compares to recent work and other published curricular or course materials. We identify areas of strong agreement and disagreement with the framework. Results from the literature analysis provide strong insights into what topics the broader community see as belonging in (or not in) Data Science, both at curricular and course levels. This analysis can provide important guidance for groups working to formalize the discipline and any college or university looking to build their own undergraduate Data Science degree or programs
On definitions of "mathematician"
The definition of who is or what makes a ``mathematician" is an important and
urgent issue to be addressed in the mathematics community. Too often, a
narrower definition of who is considered a mathematician (and what is
considered mathematics) is used to exclude people from the discipline -- both
explicitly and implicitly. However, using a narrow definition of a
mathematician allows us to examine and challenge systemic barriers that exist
in certain spaces of the community. This paper explores and illuminates
tensions between narrow and broad definitions and how they can be used to
promote both inclusion and exclusion simultaneously. In this article, we
present a framework of definitions based on identity, function, and
qualification and exploring several different meanings of ``mathematician". By
interrogating various definitions, we highlight their risks and opportunities,
with an emphasis on implications for broadening and/or narrowing participation
of underrepresented groups.Comment: 21 pages, 2 figure
repytah: An Open-Source Python Package for Building Aligned Hierarchies for Sequential Data
We introduce repytah, a Python package that constructs the aligned hierarchies representation that contains all possible structure-based hierarchical decompositions for a finite length piece of sequential data aligned on a common time axis. In particular, this representation–introduced by Kinnaird (2016) with music-based data (like musical recordings or scores) as the primary motivation–is intended for sequential data where repetitions have particular meaning (such as a verse, chorus, motif, or theme). Although the original motivation for the aligned hierarchies representation was finding structure for music-based data streams, there is nothing inherent in the construction of these representations that limits repytah to only being used on sequential data that is music-based.
The repytah package builds these aligned hierarchies by first extracting repeated structures (of all meaningful lengths) from the self-dissimilarity matrix (SDM) for a piece of sequential data. Intentionally repytah uses the SDM as the starting point for constructing the aligned hierarchies, as an SDM cannot be reversed-engineered back to the original signal and allows for researchers to collaborate with signals that are protected either by copyright or under privacy considerations. This package is a Python translation of the original MATLAB code by Kinnaird (2014) with additional documentation, and the code has been updated to leverage efficiencies in Python
Teaching Computational Machine Learning (without Statistics)
This paper presents an undergraduate machine learning course that emphasizes algorithmic understanding and programming skills while assuming no statistical training. Emphasizing the development of good habits of mind, this course trains students to be independent machine learning practitioners through an iterative, cyclical framework for teaching concepts while adding increasing depth and nuance. Beginning with unsupervised learning, this course is sequenced as a series of machine learning ideas and concepts with specific algorithms acting as concrete examples. This paper also details course organization including evaluation practices and logistics