10,111 research outputs found
Visual analysis for drum sequence transcription
A system is presented for analysing drum performance video sequences. A novel ellipse detection algorithm is introduced that automatically locates drum tops. This algorithm fits ellipses to edge clusters, and ranks them according to various fitness criteria. A background/foreground segmentation method is then used to extract the silhouette of the drummer and drum sticks. Coupled with a motion
intensity feature, this allows for the detection of ‘hits’ in each of the extracted regions. In order to obtain a transcription of the performance, each of these regions is automatically labeled with the corresponding instrument class. A partial audio transcription and color cues are used to measure the compatibility between a region and its label, the Kuhn-Munkres algorithm is then employed to find the optimal labeling. Experimental results demonstrate the ability of visual analysis to enhance the performance of an audio drum transcription system
Weakly-Supervised Temporal Localization via Occurrence Count Learning
We propose a novel model for temporal detection and localization which allows
the training of deep neural networks using only counts of event occurrences as
training labels. This powerful weakly-supervised framework alleviates the
burden of the imprecise and time-consuming process of annotating event
locations in temporal data. Unlike existing methods, in which localization is
explicitly achieved by design, our model learns localization implicitly as a
byproduct of learning to count instances. This unique feature is a direct
consequence of the model's theoretical properties. We validate the
effectiveness of our approach in a number of experiments (drum hit and piano
onset detection in audio, digit detection in images) and demonstrate
performance comparable to that of fully-supervised state-of-the-art methods,
despite much weaker training requirements.Comment: Accepted at ICML 201
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Acoustic events often have a visual counterpart. Knowledge of visual
information can aid the understanding of complex auditory scenes, even when
only a stereo mixdown is available in the audio domain, \eg identifying which
musicians are playing in large musical ensembles. In this paper, we consider a
vision-based approach to note onset detection. As a case study we focus on
challenging, real-world clarinetist videos and carry out preliminary
experiments on a 3D convolutional neural network based on multiple streams and
purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5
hours of clarinetist videos together with cleaned annotations which include
about 36,000 onsets and the coordinates for a number of salient points and
regions of interest. By performing several training trials on our dataset, we
learned that the problem is challenging. We found that the CNN model is highly
sensitive to the optimization algorithm and hyper-parameters, and that treating
the problem as binary classification may prevent the joint optimization of
precision and recall. To encourage further research, we publicly share our
dataset, annotations and all models and detail which issues we came across
during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]
Learning and Production of Movement Sequences: Behavioral, Neurophysiological, and Modeling Perspectives
A growing wave of behavioral studies, using a wide variety of paradigms that were introduced or greatly refined in recent years, has generated a new wealth of parametric observations about serial order behavior. What was a mere trickle of neurophysiological studies has grown to a more steady stream of probes of neural sites and mechanisms underlying sequential behavior. Moreover, simulation models of serial behavior generation have begun to open a channel to link cellular dynamics with cognitive and behavioral dynamics. Here we summarize the major results from prominent sequence learning and performance tasks, namely immediate serial recall, typing, 2XN, discrete sequence production, and serial reaction time. These populate a continuum from higher to lower degrees of internal control of sequential organization. The main movement classes covered are speech and keypressing, both involving small amplitude movements that are very amenable to parametric study. A brief synopsis of classes of serial order models, vis-à-vis the detailing of major effects found in the behavioral data, leads to a focus on competitive queuing (CQ) models. Recently, the many behavioral predictive successes of CQ models have been joined by successful prediction of distinctively patterend electrophysiological recordings in prefrontal cortex, wherein parallel activation dynamics of multiple neural ensembles strikingly matches the parallel dynamics predicted by CQ theory. An extended CQ simulation model-the N-STREAMS neural network model-is then examined to highlight issues in ongoing attemptes to accomodate a broader range of behavioral and neurophysiological data within a CQ-consistent theory. Important contemporary issues such as the nature of working memory representations for sequential behavior, and the development and role of chunks in hierarchial control are prominent throughout.Defense Advanced Research Projects Agency/Office of Naval Research (N00014-95-1-0409); National Institute of Mental Health (R01 DC02852
Collecting ground truth annotations for drum detection in polyphonic music
In order to train and test algorithms that can automatically detect drum events in polyphonic music, ground truth data is needed. This paper describes a setup used for gathering manual annotations for 49 real-world music fragments containing different drum event types. Apart from the drum events, the beat was also annotated. The annotators were experienced drummers or percussionists. This paper is primarily aimed towards other drum detection researchers, but might also be of interest to others dealing with automatic music analysis, manual annotation and data gathering. Its purpose is threefold: providing annotation data for algorithm training and evaluation, describing a practical way of setting up a drum annotation task, and reporting issues that came up during the annotation sessions while at the same time providing some thoughts on important points that could be taken into account when setting up similar tasks in the future
Disruption of the basal body protein POC1B results in autosomal-recessive cone-rod dystrophy
Exome sequencing revealed a homozygous missense mutation (c.317C>G [p.Arg106Pro]) in POC1B, encoding POC1 centriolar protein B, in three siblings with autosomal-recessive cone dystrophy or cone-rod dystrophy and compound-heterozygous POC1B mutations (c.199_201del [p.G1n67del] and c.810+1G>T) in an unrelated person with cone-rod dystrophy. Upon overexpression of POC1B in human TERT-immortalized retinal pigment epithelium 1 cells, the encoded wild-type protein localized to the basal body of the primary cilium, whereas this localization was lost for p.Arg106Pro and p.G1n67del variant forms of POC1B. Morpholino-oligonucleotide-induced knockdown of poc1b translation in zebrafish resulted in a dose-dependent small-eye phenotype, impaired optokinetic responses, and decreased length of photoreceptor outer segments. These ocular phenotypes could partially be rescued by wild-type human POC1B mRNA, but not by c.199_201del and c.317C>G mutant human POC1B mRNAs. Yeast two-hybrid screening of a human retinal cDNA library revealed FAM161A as a binary interaction partner of POC1B. This was confirmed in coimmunoprecipitation and colocalization assays, which both showed loss of FAM161A interaction with p.Arg106Pro and p.G1n67del variant forms of POC1B. FAM161A was previously implicated in autosomal-recessive retinitis pigmentosa and shown to be located at the base of the photoreceptor connecting cilium, where it interacts with several other ciliopathy-associated proteins. Altogether, this study demonstrates that POC1B mutations result in a defect of the photoreceptor sensory cilium and thus affect cone and rod photoreceptors
- …