68,367 research outputs found

    Transcribing Content from Structural Images with Spotlight Mechanism

    Full text link
    Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18

    A robust braille recognition system

    Get PDF
    Braille is the most effective means of written communication between visually-impaired and sighted people. This paper describes a new system that recognizes Braille characters in scanned Braille document pages. Unlike most other approaches, an inexpensive flatbed scanner is used and the system requires minimal interaction with the user. A unique feature of this system is the use of context at different levels (from the pre-processing of the image through to the post-processing of the recognition results) to enhance robustness and, consequently, recognition results. Braille dots composing characters are identified on both single and double-sided documents of average quality with over 99% accuracy, while Braille characters are also correctly recognised in over 99% of documents of average quality (in both single and double-sided documents)

    Sensing and mapping for interactive performance

    Get PDF
    This paper describes a trans-domain mapping (TDM) framework for translating meaningful activities from one creative domain onto another. The multi-disciplinary framework is designed to facilitate an intuitive and non-intrusive interactive multimedia performance interface that offers the users or performers real-time control of multimedia events using their physical movements. It is intended to be a highly dynamic real-time performance tool, sensing and tracking activities and changes, in order to provide interactive multimedia performances. From a straightforward definition of the TDM framework, this paper reports several implementations and multi-disciplinary collaborative projects using the proposed framework, including a motion and colour-sensitive system, a sensor-based system for triggering musical events, and a distributed multimedia server for audio mapping of a real-time face tracker, and discusses different aspects of mapping strategies in their context. Plausible future directions, developments and exploration with the proposed framework, including stage augmenta tion, virtual and augmented reality, which involve sensing and mapping of physical and non-physical changes onto multimedia control events, are discussed

    Extending a network-of-elaborations representation to polyphonic music: Schenker and species counterpoint.

    Get PDF
    A system of representing melodies as a network of elaborations has been developed, and used as the basis for software which generates melodies in response to the movements of a dancer. This paper examines the issues of extending this representation system to polyphonic music, and of deriving a structural representation of this kind from a musical score. The theories of Heinrich Schenker and of Species Counterpoint are proposed as potentially fruitful bases

    D-touch: A Consumer-Grade Tangible Interface Module and Musical Applications

    No full text
    We define a class of tangible media applications that can be implemented on consumer-grade personal computers. These applications interpret user manipulation of physical objects in a restricted space and produce unlocalized outputs. We propose a generic approach to the implementation of such interfaces using flexible fiducial markers, which identify objects to a robust and fast video-processing algorithm, so they can be recognized and tracked in real time. We describe an implementation of the technology, then report two new, flexible music performance applications that demonstrate and validate it

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Identifying music documents in a collection of images

    Get PDF
    Digital libraries and search engines are now well-equipped to find images of documents based on queries. Many images of music scores are now available, often mixed up with textual documents and images. For example, using the Google “images” search feature, a search for “Beethoven” will return a number of scores and manuscripts as well as pictures of the composer. In this paper we report on an investigation into methods to mechanically determine if a particular document is indeed a score, so that the user can specify that only musical scores should be returned. The goal is to find a minimal set of features that can be used as a quick test that will be applied to large numbers of documents. A variety of filters were considered, and two promising ones (run-length ratios and Hough transform) were evaluated. We found that a method based around run-lengths in vertical scans (RL) that out-performs a comparable algorithm using the Hough transform (HT). On a test set of 1030 images, RL achieved recall and precision of 97.8% and 88.4% respectively while HT achieved 97.8% and 73.5%. In terms of processor time, RL was more than five times as fast as HT
    • 

    corecore