658 research outputs found

    Optical Music Recognition with Convolutional Sequence-to-Sequence Models

    Get PDF
    Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-end trainable OMR pipeline, and apply a learning process that trains on full sentences of sheet music instead of individually labeled symbols. The model is trained and evaluated on a human generated data set, with various image augmentations based on real-world scenarios. This data set is the first publicly available set in OMR research with sufficient size to train and evaluate deep learning models. With the introduced augmentations a pitch recognition accuracy of 81% and a duration accuracy of 94% is achieved, resulting in a note level accuracy of 80%. Finally, the model is compared to commercially available methods, showing a large improvements over these applications.Comment: ISMIR 201

    Hemorrhage Detection and Analysis in Traumatic Pelvic Injuries

    Get PDF
    Traumatic pelvic injuries associated with high-energy pelvic fractures are life-threatening injuries. Extensive bleeding is relatively common with pelvic fractures. However, bleeding is especially prevalent with high-energy fractures. Hemorrhage remains the major cause of death that occur within the first 24 hours after a traumatic pelvic injury. Emergent-life saving treatment is required for high-energy pelvic fractures associated with hemorrhage. A thorough understanding of potential sources of bleeding within a short period is essential for diagnosis and treatment planning. Computed Tomography (CT) images have been widely in use in identifying the potential sources of bleeding. A pelvic CT scan contains a large number of images. Analyzing each slice in a scan via simple visual inspection is very time consuming. Time is a crucial factor in emergency medicine. Therefore, a computer-assisted pelvic trauma decision-making system is advantageous for assisting physicians in fast and accurate decision making and treatment planning. The proposed project presents an automated system to detect and segment hemorrhage and combines it with the other extracted features from pelvic images and demographic data to provide recommendations to trauma caregivers for diagnosis and treatment. The first part of the project is to develop automated methods to detect arteries by incorporating bone information. This part of the project merges bone edges and segments bone using a seed growing technique. Later the segmented bone information is utilized along with the best template matching to locate arteries and extract gray level information of the located arteries in the pelvic region. The second part of the project focuses on locating the source of hemorrhage and its segmentation. The hemorrhage is segmented using a novel rule based hemorrhage segmentation approach. This approach segments hemorrhage through hemorrhage matching, rule optimization, and region growing. Later the position of hemorrhage in the image and the volume of the hemorrhage are determined to analyze hemorrhage severity. The third part of the project is to automatically classify the outcome using features extracted from the medical images and patient medical records and demographics. A multi-stage feature selection algorithm is used to select the predominant features among all the features. Finally, boosted logistic model tree is used to classify the outcome. The methods are tested on CT images of traumatic pelvic injury patients. The hemorrhage segmentation and classification results seem promising and demonstrate that the proposed method is not only capable of automatically segmenting hemorrhage and classifying outcome, but also has the potential to be used for clinical applications. Finally, the project is extended to abdominal trauma and a novel knowledge based heuristic technique is used to detect and segment spleen from the abdominal CT images. This technique is tested on a limited number of subjects and the results are promising

    A Generative Method for Textured Motion: Analysis and Synthesis

    Full text link
    Abstract. Natural scenes contain rich stochastic motion patterns which are characterized by the movement of a large number of small elements, such as falling snow, raining, ÿying birds, þrework and waterfall. In this paper, we call these motion patterns textured motion and present a gen-erative method that combines statistical models and algorithms from both texture and motion analysis. The generative method includes the following three aspects. 1). Photometrically, an image is represented as a superposition of linear bases in atomic decomposition using an over-complete dictionary, such as Gabor or Laplacian. Such base representa-tion is known to be generic for natural images, and it is low dimensional as the number of bases is often 100 times smaller than the number of pixels. 2). Geometrically, each moving element (called moveton), such as the individual snowÿake and bird, is represented by a deformable template which is a group of several spatially adjacent bases. Such tem-plates are learned through clustering. 3). Dynamically, the movetons ar

    Visual Speech Recognition

    Get PDF
    Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed, automating the human ability to lip read, a process referred to as visual speech recognition (VSR) (or sometimes speech reading), could open the door for other novel related applications. VSR has received a great deal of attention in the last decade for its potential use in applications such as human-computer interaction (HCI), audio-visual speech recognition (AVSR), speaker recognition, talking heads, sign language recognition and video surveillance. Its main aim is to recognise spoken word(s) by using only the visual signal that is produced during speech. Hence, VSR deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN: 978-953-307-322-4, InTech (2011

    Applications of stochastic geometry in image analysis

    Get PDF

    On the 3D point cloud for human-pose estimation

    Get PDF
    This thesis aims at investigating methodologies for estimating a human pose from a 3D point cloud that is captured by a static depth sensor. Human-pose estimation (HPE) is important for a range of applications, such as human-robot interaction, healthcare, surveillance, and so forth. Yet, HPE is challenging because of the uncertainty in sensor measurements and the complexity of human poses. In this research, we focus on addressing challenges related to two crucial components in the estimation process, namely, human-pose feature extraction and human-pose modeling. In feature extraction, the main challenge involves reducing feature ambiguity. We propose a 3D-point-cloud feature called viewpoint and shape feature histogram (VISH) to reduce feature ambiguity by capturing geometric properties of the 3D point cloud of a human. The feature extraction consists of three steps: 3D-point-cloud pre-processing, hierarchical structuring, and feature extraction. In the pre-processing step, 3D points corresponding to a human are extracted and outliers from the environment are removed to retain the 3D points of interest. This step is important because it allows us to reduce the number of 3D points by keeping only those points that correspond to the human body for further processing. In the hierarchical structuring, the pre-processed 3D point cloud is partitioned and replicated into a tree structure as nodes. Viewpoint feature histogram (VFH) and shape features are extracted from each node in the tree to provide a descriptor to represent each node. As the features are obtained based on histograms, coarse-level details are highlighted in large regions and fine-level details are highlighted in small regions. Therefore, the features from the point cloud in the tree can capture coarse level to fine level information to reduce feature ambiguity. In human-pose modeling, the main challenges involve reducing the dimensionality of human-pose space and designing appropriate factors that represent the underlying probability distributions for estimating human poses. To reduce the dimensionality, we propose a non-parametric action-mixture model (AMM). It represents high-dimensional human-pose space using low-dimensional manifolds in searching human poses. In each manifold, a probability distribution is estimated based on feature similarity. The distributions in the manifolds are then redistributed according to the stationary distribution of a Markov chain that models the frequency of human actions. After the redistribution, the manifolds are combined according to a probability distribution determined by action classification. Experiments were conducted using VISH features as input to the AMM. The results showed that the overall error and standard deviation of the AMM were reduced by about 7.9% and 7.1%, respectively, compared with a model without action classification. To design appropriate factors, we consider the AMM as a Bayesian network and propose a mapping that converts the Bayesian network to a neural network called NN-AMM. The proposed mapping consists of two steps: structure identification and parameter learning. In structure identification, we have developed a bottom-up approach to build a neural network while preserving the Bayesian-network structure. In parameter learning, we have created a part-based approach to learn synaptic weights by decomposing a neural network into parts. Based on the concept of distributed representation, the NN-AMM is further modified into a scalable neural network called NND-AMM. A neural-network-based system is then built by using VISH features to represent 3D-point-cloud input and the NND-AMM to estimate 3D human poses. The results showed that the proposed mapping can be utilized to design AMM factors automatically. The NND-AMM can provide more accurate human-pose estimates with fewer hidden neurons than both the AMM and NN-AMM can. Both the NN-AMM and NND-AMM can adapt to different types of input, showing the advantage of using neural networks to design factors
    corecore