176 research outputs found
A supervised classification approach for note tracking in polyphonic piano transcription
In the field of Automatic Music Transcription, note tracking systems constitute a key process in the overall success of the task as they compute the expected note-level abstraction out of a frame-based pitch activation representation. Despite its relevance, note tracking is most commonly performed using a set of hand-crafted rules adjusted in a manual fashion for the data at issue. In this regard, the present work introduces an approach based on machine learning, and more precisely supervised classification, that aims at automatically inferring such policies for the case of piano music. The idea is to segment each pitch band of a frame-based pitch activation into single instances which are subsequently classified as active or non-active note events. Results using a comprehensive set of supervised classification strategies on the MAPS piano data-set report its competitiveness against other commonly considered strategies for note tracking as well as an improvement of more than +10% in terms of F-measure when compared to the baseline considered for both frame-level and note-level evaluations.This research work is partially supported by Universidad de Alicante through the FPU program [UAFPU2014–5883] and the Spanish Ministerio de EconomĂa y Competitividad through project TIMuL [No. TIN2013–48152–C2–1–R, supported by EU FEDER funds]. EB is supported by a UK RAEng Research Fellowship [grant number RF/128]
Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency
accepteddate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfdate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfWe present Tony, a software tool for the interactive an- notation of melodies from monophonic audio recordings, and evaluate its usability and the accuracy of its note extraction method. The scientific study of acoustic performances of melodies, whether sung or played, requires the accurate transcription of notes and pitches. To achieve the desired transcription accuracy for a particular application, researchers manually correct results obtained by automatic methods. Tony is an interactive tool directly aimed at making this correction task efficient. It provides (a) state-of-the art algorithms for pitch and note estimation, (b) visual and auditory feedback for easy error-spotting, (c) an intelligent graphical user interface through which the user can rapidly correct estimation errors, (d) extensive export functions enabling further processing in other applications. We show that Tony’s built in automatic note transcription method compares favourably with existing tools. We report how long it takes to annotate recordings on a set of 96 solo vocal recordings and study the effect of piece, the number of edits made and the annotator’s increasing mastery of the software. Tony is Open Source software, with source code and compiled binaries for Windows, Mac OS X and Linux available from https://code.soundsoftware.ac.uk/projects/tony/
Automatic chord transcription from audio using computational models of musical context
PhDThis thesis is concerned with the automatic transcription of chords from audio, with an emphasis
on modern popular music. Musical context such as the key and the structural segmentation aid
the interpretation of chords in human beings. In this thesis we propose computational models
that integrate such musical context into the automatic chord estimation process.
We present a novel dynamic Bayesian network (DBN) which integrates models of metric
position, key, chord, bass note and two beat-synchronous audio features (bass and treble
chroma) into a single high-level musical context model. We simultaneously infer the most probable
sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several
experiments with real world data show that adding context parameters results in a significant
increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed,
most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song
collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline
method for two further enhancements.
Firstly, we aim to improve chord confusion behaviour by modifying the audio front end
processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect
of using chromagrams generated from an approximate pitch transcription method. We show
that using chromagrams from approximate transcription results in the most substantial increase
in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of
the art.
Secondly, we propose a method by which chromagram information is shared between
repeated structural segments (such as verses) in a song. This can be done fully automatically
using a novel structural segmentation algorithm tailored to this task. We show that the technique
leads to a significant increase in accuracy and readability. The segmentation algorithm itself
also obtains state-of-the-art results. A method that combines both of the above enhancements
reaches an accuracy of 81%, a statistically significant improvement over the best result (74%)
in the 2009 MIREX Chord Detection tasks.Engineering and Physical Research Council U
Singing information processing: techniques and applications
Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas.
Por Ăşltimo, se propone un mĂ©todo para la correcciĂłn de disonancias en acordes aislados. Se basa en un análisis de mĂşltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluaciĂłn la ha realizado un grupo de mĂşsicos entrenados, y muestra un claro incremento de la consonancia percibida despuĂ©s de la transformaciĂłn propuesta.La voz cantada es una componente esencial de la mĂşsica en todas las culturas del mundo, ya que se trata de una forma increĂblemente natural de expresiĂłn musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de tĂ©cnicas y aplicaciones relacionadas con el procesado de voz cantada, asĂ como con un repaso del estado del arte asociado en cada caso.
En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperaciĂłn por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraĂdos.
Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, asà como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores.
En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos
Computational Methods for the Alignment and Score-Informed Transcription of Piano Music
PhDThis thesis is concerned with computational methods for alignment and score-informed
transcription of piano music. Firstly, several methods are proposed to improve the alignment
robustness and accuracywhen various versions of one piece of music showcomplex
differences with respect to acoustic conditions or musical interpretation. Secondly, score
to performance alignment is applied to enable score-informed transcription.
Although music alignment methods have considerably improved in accuracy in recent
years, the task remains challenging. The research in this thesis aims to improve the
robustness for some cases where there are substantial differences between versions and
state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits
the availability of multiple versions of the piece to be aligned. By processing these
jointly, the alignment process can be stabilised by exploiting additional examples of how
a section might be interpreted or which acoustic conditions may arise. Two methods are
proposed, progressive alignment and profile HMM, both adapted from the multiple biological
sequence alignment task. Experiments demonstrate that these methods can indeed
improve the alignment accuracy and robustness over comparable pairwise methods.
Secondly, this thesis presents a score to performance alignment method that can improve
the robustness in cases where some musical voices, such as the melody, are played asynchronously
to others – a stylistic device used in musical expression. The asynchronies between
the melody and the accompaniment are handled by treating the voices as separate
timelines in a multi-dimensional variant of dynamic time warping (DTW). The method
measurably improves the alignment accuracy for pieces with asynchronous voices and
preserves the accuracy otherwise.
Once an accurate alignment between a score and an audio recording is available, the
score information can be exploited as prior knowledge in automatic music transcription
(AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary
learning is used to learn the spectral pattern of each pitch that describes the energy
distribution of the associated notes in the recording. More precisely, the dictionary learning
process in non-negative matrix factorization (NMF) is constrained using the aligned
score. This way, by adapting the dictionary to a given recording, the proposed method
improves the accuracy over the state-of-the-art.China Scholarship Council
- …