555 research outputs found

    Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation

    Get PDF
    Staff detection and removal is one of the most important issues in optical music recognition (OMR) tasks since common approaches for symbol detection and classification are based on this process. Due to its complexity, staff detection and removal is often inaccurate, leading to a great number of errors in posterior stages. For this reason, a new approach that avoids this stage is proposed in this paper, which is expected to overcome these drawbacks. Our approach is put into practice in a case of study focused on scores written in white mensural notation. Symbol detection is performed by using the vertical projection of the staves. The cross-correlation operator for template matching is used at the classification stage. The goodness of our proposal is shown in an experiment in which our proposal attains an extraction rate of 96 % and a classification rate of 92 %, on average. The results found have reinforced the idea of pursuing a new research line in OMR systems without the need of the removal of staff lines.This work has been funded by the Ministerio de Educación, Cultura y Deporte of the Spanish Government under a FPU Fellowship No. AP20120939, by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-48152-C2-1-R and Project No. TIN2013-47276-C6-2-R, by the Consejería de Educación de la Comunidad Valenciana under Project No. PROMETEO/2012/017 and by the Junta de Andalucía under Project No. P11-TIC-7154

    Block Matching Algorithms for the Estimation of Motion in Image Sequences: Analysis

    Get PDF
    Several video coding standards and techniques have been introduced for multimedia applications, particularly the h.26x series for video processing. These standards employ motion estimation processing to reduce the amount of data that is required to store or transmit the video. The motion estimation process is an inextricable part of the video coding as it removes the temporal redundancy between successive frames of video sequences. This paper is about these motion estimation algorithms, their search procedures, complexity, advantages, and limitations. A survey of motion estimation algorithms including full search, many fast, and fast full search block-based algorithms has been presented. An evaluation of up-to-date motion estimation algorithms, based on several empirical results on several test video sequences, is presented as well

    Detail-preserving and Content-aware Variational Multi-view Stereo Reconstruction

    Full text link
    Accurate recovery of 3D geometrical surfaces from calibrated 2D multi-view images is a fundamental yet active research area in computer vision. Despite the steady progress in multi-view stereo reconstruction, most existing methods are still limited in recovering fine-scale details and sharp features while suppressing noises, and may fail in reconstructing regions with few textures. To address these limitations, this paper presents a Detail-preserving and Content-aware Variational (DCV) multi-view stereo method, which reconstructs the 3D surface by alternating between reprojection error minimization and mesh denoising. In reprojection error minimization, we propose a novel inter-image similarity measure, which is effective to preserve fine-scale details of the reconstructed surface and builds a connection between guided image filtering and image registration. In mesh denoising, we propose a content-aware p\ell_{p}-minimization algorithm by adaptively estimating the pp value and regularization parameters based on the current input. It is much more promising in suppressing noise while preserving sharp features than conventional isotropic mesh smoothing. Experimental results on benchmark datasets demonstrate that our DCV method is capable of recovering more surface details, and obtains cleaner and more accurate reconstructions than state-of-the-art methods. In particular, our method achieves the best results among all published methods on the Middlebury dino ring and dino sparse ring datasets in terms of both completeness and accuracy.Comment: 14 pages,16 figures. Submitted to IEEE Transaction on image processin

    A Low Cost and Computationally Efficient Approach for Occlusion Handling in Video Surveillance Systems

    Get PDF
    In the development of intelligent video surveillance systems for tracking a vehicle, occlusions are one of the major challenges. It becomes difficult to retain features during occlusion especially in case of complete occlusion. In this paper, a target vehicle tracking algorithm for Smart Video Surveillance (SVS) is proposed to track an unidentified target vehicle even in case of occlusions. This paper proposes a computationally efficient approach for handling occlusions named as Kalman Filter Assisted Occlusion Handling (KFAOH) technique. The algorithm works through two periods namely tracking period when no occlusion is seen and detection period when occlusion occurs, thus depicting its hybrid nature. Kanade-Lucas-Tomasi (KLT) feature tracker governs the operation of algorithm during the tracking period, whereas, a Cascaded Object Detector (COD) of weak classifiers, specially trained on a large database of cars governs the operation during detection period or occlusion with the assistance of Kalman Filter (KF). The algorithm’s tracking efficiency has been tested on six different tracking scenarios with increasing complexity in real-time. Performance evaluation under different noise variances and illumination levels shows that the tracking algorithm has good robustness against high noise and low illumination. All tests have been conducted on the MATLAB platform. The validity and practicality of the algorithm are also verified by success plots and precision plots for the test cases

    Template matching based TRN using Flash LiDAR

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 공과대학 기계항공공학부, 2018. 8. 박찬국.This thesis compares and analyzes a performance of template matching based terrain referenced navigation (TMTRN) using correlation functions according to different error types and correlation functions. Conventional batch processing TRN generally utilizes the radar altimeter and adopts mean square difference (MSD), mean absolute difference (MAD), and normalized cross correlation (NCC) for matching a batch profile with terrain database. If a flash LiDAR is utilized instead of the radar, it is possible to build a profile in one-shot. A point cloud of the flash LiDAR can be transformed into 2D profile, unlike a vector profile obtained from batch processing. Therefore, by using the flash LiDAR we can apply new correlation functions such as image Euclidean distance (IMED) and image normalized cross correlation (IMNCC) which have been used in computer vision field. The simulation result shows that IMED is the most robust for different types of errors.Chapter 1 Introduction 1 1.1 Motivation and background . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives and contributions . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Related Works 5 2.1 Terrain Referenced Navigation . . . . . . . . . . . . . . . . . . . 5 2.1.1 LiDAR-based TRN . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Image-based TRN . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 General idea of template matching . . . . . . . . . . . . . 16 2.2.2 Correlation function . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3 Template matching based TRN 22 3.1 Relationship with BPTRN . . . . . . . . . . . . . . . . . . . . . . 22 3.2 TMTRN algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 4 Simulation Results 30 4.1 Template matching of terrain PC . . . . . . . . . . . . . . . . . . 30 4.2 TMTRN simulation . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 5 Conclusions and Future Works 46 5.1 Summary of the contribution . . . . . . . . . . . . . . . . . . . . 46 5.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Maste

    Lung nodule modeling and detection for computerized image analysis of low dose CT imaging of the chest.

    Get PDF
    From a computerized image analysis prospective, early diagnosis of lung cancer involves detection of doubtful nodules and classification into different pathologies. The detection stage involves a detection approach, usually by template matching, and an authentication step to reduce false positives, usually conducted by a classifier of one form or another; statistical, fuzzy logic, support vector machines approaches have been tried. The classification stage matches, according to a particular approach, the characteristics (e.g., shape, texture and spatial distribution) of the detected nodules to common characteristics (again, shape, texture and spatial distribution) of nodules with known pathologies (confirmed by biopsies). This thesis focuses on the first step; i.e., nodule detection. Specifically, the thesis addresses three issues: a) understanding the CT data of typical low dose CT (LDCT) scanning of the chest, and devising an image processing approach to reduce the inherent artifacts in the scans; b) devising an image segmentation approach to isolate the lung tissues from the rest of the chest and thoracic regions in the CT scans; and c) devising a nodule modeling methodology to enhance the detection rate and lend benefits for the ultimate step in computerized image analysis of LDCT of the lungs, namely associating a pathology to the detected nodule. The methodology for reducing the noise artifacts is based on noise analysis and examination of typical LDCT scans that may be gathered on a repetitive fashion; since, a reduction in the resolution is inevitable to avoid excessive radiation. Two optimal filtering methods are tested on samples of the ELCAP screening data; the Weiner and the Anisotropic Diffusion Filters. Preference is given to the Anisotropic Diffusion Filter, which can be implemented on 7x7 blocks/windows of the CT data. The methodology for lung segmentation is based on the inherent characteristics of the LDCT scans, shown as distinct bi-modal gray scale histogram. A linear model is used to describe the histogram (the joint probability density function of the lungs and non-lungs tissues) by a linear combination of weighted kernels. The Gaussian kernels were chosen, and the classic Expectation-Maximization (EM) algorithm was employed to estimate the marginal probability densities of the lungs and non-lungs tissues, and select an optimal segmentation threshold. The segmentation is further enhanced using standard shape analysis based on mathematical morphology, which improves the continuity of the outer and inner borders of the lung tissues. This approach (a preliminary version of it appeared in [14]) is found to be adequate for lung segmentation as compared to more sophisticated approaches developed at the CVIP Lab (e.g., [15][16]) and elsewhere. The methodology developed for nodule modeling is based on understanding the physical characteristics of the nodules in LDCT scans, as identified by human experts. An empirical model is introduced for the probability density of the image intensity (or Hounsfield units) versus the radial distance measured from the centroid – center of mass - of typical nodules. This probability density showed that the nodule spatial support is within a circle/square of size 10 pixels; i.e., limited to 5 mm in length; which is within the range that the radiologist specify to be of concern. This probability density is used to fill in the intensity (or Hounsfield units) of parametric nodule models. For these models (e.g., circles or semi-circles), given a certain radius, we calculate the intensity (or Hounsfield units) using an exponential expression for the radial distance with parameters specified from the histogram of an ensemble of typical nodules. This work is similar in spirit to the earlier work of Farag et al., 2004 and 2005 [18][19], except that the empirical density of the radial distance and the histogram of typical nodules provide a data-driven guide for estimating the intensity (or Hounsfield units) of the nodule models. We examined the sensitivity and specificity of parametric nodules in a template-matching framework for nodule detection. We show that false positives are inevitable problems with typical machine learning methods of automatic lung nodule detection, which invites further efforts and perhaps fresh thinking into automatic nodule detection. A new approach for nodule modeling is introduced in Chapter 5 of this thesis, which brings high promise in both the detection, and the classification of nodules. Using the ELCAP study, we created an ensemble of four types of nodules and generated a nodule model for each type based on optimal data reduction methods. The resulting nodule model, for each type, has lead to drastic improvements in the sensitivity and specificity of nodule detection. This approach may be used as well for classification. In conclusion, the methodologies in this thesis are based on understanding the LDCT scans and what is to be expected in terms of image quality. Noise reduction and image segmentation are standard. The thesis illustrates that proper nodule models are possible and indeed a computerized approach for image analysis to detect and classify lung nodules is feasible. Extensions to the results in this thesis are immediate and the CVIP Lab has devised plans to pursue subsequent steps using clinical data

    Deep Feature Learning and Adaptation for Computer Vision

    Get PDF
    We are living in times when a revolution of deep learning is taking place. In general, deep learning models have a backbone that extracts features from the input data followed by task-specific layers, e.g. for classification. This dissertation proposes various deep feature extraction and adaptation methods to improve task-specific learning, such as visual re-identification, tracking, and domain adaptation. The vehicle re-identification (VRID) task requires identifying a given vehicle among a set of vehicles under variations in viewpoint, illumination, partial occlusion, and background clutter. We propose a novel local graph aggregation module for feature extraction to improve VRID performance. We also utilize a class-balanced loss to compensate for the unbalanced class distribution in the training dataset. Overall, our framework achieves state-of-the-art (SOTA) performance in multiple VRID benchmarks. We further extend our VRID method for visual object tracking under occlusion conditions. We motivate visual object tracking from aerial platforms by conducting a benchmarking of tracking methods on aerial datasets. Our study reveals that the current techniques have limited capabilities to re-identify objects when fully occluded or out of view. The Siamese network based trackers perform well compared to others in overall tracking performance. We utilize our VRID work in visual object tracking and propose Siam-ReID, a novel tracking method using a Siamese network and VRID technique. In another approach, we propose SiamGauss, a novel Siamese network with a Gaussian Head for improved confuser suppression and real time performance. Our approach achieves SOTA performance on aerial visual object tracking datasets. A related area of research is developing deep learning based domain adaptation techniques. We propose continual unsupervised domain adaptation, a novel paradigm for domain adaptation in data constrained environments. We show that existing works fail to generalize when the target domain data are acquired in small batches. We propose to use a buffer to store samples that are previously seen by the network and a novel loss function to improve the performance of continual domain adaptation. We further extend our continual unsupervised domain adaptation research for gradually varying domains. Our method outperforms several SOTA methods even though they have the entire domain data available during adaptation
    corecore