10 research outputs found

    Incremental learning-based visual tracking with weighted discriminative dictionaries

    Get PDF
    Existing sparse representation-based visual tracking methods detect the target positions by minimizing the reconstruction error. However, due to complex background, illumination change, and occlusion problems, these methods are difficult to locate the target properly. In this article, we propose a novel visual tracking method based on weighted discriminative dictionaries and a pyramidal feature selection strategy. First, we utilize color features and texture features of the training samples to obtain multiple discriminative dictionaries. Then, we use the position information of those samples to assign weights to the base vectors in dictionaries. For robust visual tracking, we propose a pyramidal sparse feature selection strategy where the weights of base vectors and reconstruction errors in different feature are integrated together to get the best target regions. At the same time, we measure feature reliability to dynamically adjust the weights of different features. In addition, we introduce a scenario-aware mechanism and an incremental dictionary update method based on noise energy analysis. Comparison experiments show that the proposed algorithm outperforms several state-of-the-art methods, and useful quantitative and qualitative analyses are also carried out

    Fault diagnosis of rotating machinery under time-varying speed based on order tracking and deep learning

    Get PDF
    Due to the disadvantages that rely on prior knowledge and expert experience in traditional order analysis methods and deep learning cannot accurately extract the features in time-varying conditions. A fault diagnosis method for rotating machinery under time-varying conditions based on tacholess order tracking (TOT) and deep learning is proposed in this paper. Firstly, frequency domain periodic signals and estimated speed information are obtained by order tracking. Secondly, the frequency domain periodic signal is speed normalized using the estimated speed information. Finally, normalized features are extracted by deep learning network to form feature vector. The feature vector is fed into a softmax layer to complete fault diagnosis of the gearbox. The fault diagnosis of the gearbox results are compared with other traditional methods and show that the proposed fault diagnosis method can effectively identify the faults and obtain higher fault diagnosis accuracy under time-varying speed

    Subspace Representations and Learning for Visual Recognition

    Get PDF
    Pervasive and affordable sensor and storage technology enables the acquisition of an ever-rising amount of visual data. The ability to extract semantic information by interpreting, indexing and searching visual data is impacting domains such as surveillance, robotics, intelligence, human- computer interaction, navigation, healthcare, and several others. This further stimulates the investigation of automated extraction techniques that are more efficient, and robust against the many sources of noise affecting the already complex visual data, which is carrying the semantic information of interest. We address the problem by designing novel visual data representations, based on learning data subspace decompositions that are invariant against noise, while being informative for the task at hand. We use this guiding principle to tackle several visual recognition problems, including detection and recognition of human interactions from surveillance video, face recognition in unconstrained environments, and domain generalization for object recognition.;By interpreting visual data with a simple additive noise model, we consider the subspaces spanned by the model portion (model subspace) and the noise portion (variation subspace). We observe that decomposing the variation subspace against the model subspace gives rise to the so-called parity subspace. Decomposing the model subspace against the variation subspace instead gives rise to what we name invariant subspace. We extend the use of kernel techniques for the parity subspace. This enables modeling the highly non-linear temporal trajectories describing human behavior, and performing detection and recognition of human interactions. In addition, we introduce supervised low-rank matrix decomposition techniques for learning the invariant subspace for two other tasks. We learn invariant representations for face recognition from grossly corrupted images, and we learn object recognition classifiers that are invariant to the so-called domain bias.;Extensive experiments using the benchmark datasets publicly available for each of the three tasks, show that learning representations based on subspace decompositions invariant to the sources of noise lead to results comparable or better than the state-of-the-art

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Robust Algorithms for Low-Rank and Sparse Matrix Models

    Full text link
    Data in statistical signal processing problems is often inherently matrix-valued, and a natural first step in working with such data is to impose a model with structure that captures the distinctive features of the underlying data. Under the right model, one can design algorithms that can reliably tease weak signals out of highly corrupted data. In this thesis, we study two important classes of matrix structure: low-rankness and sparsity. In particular, we focus on robust principal component analysis (PCA) models that decompose data into the sum of low-rank and sparse (in an appropriate sense) components. Robust PCA models are popular because they are useful models for data in practice and because efficient algorithms exist for solving them. This thesis focuses on developing new robust PCA algorithms that advance the state-of-the-art in several key respects. First, we develop a theoretical understanding of the effect of outliers on PCA and the extent to which one can reliably reject outliers from corrupted data using thresholding schemes. We apply these insights and other recent results from low-rank matrix estimation to design robust PCA algorithms with improved low-rank models that are well-suited for processing highly corrupted data. On the sparse modeling front, we use sparse signal models like spatial continuity and dictionary learning to develop new methods with important adaptive representational capabilities. We also propose efficient algorithms for implementing our methods, including an extension of our dictionary learning algorithms to the online or sequential data setting. The underlying theme of our work is to combine ideas from low-rank and sparse modeling in novel ways to design robust algorithms that produce accurate reconstructions from highly undersampled or corrupted data. We consider a variety of application domains for our methods, including foreground-background separation, photometric stereo, and inverse problems such as video inpainting and dynamic magnetic resonance imaging.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143925/1/brimoor_1.pd

    Reconstruction par acquisition compressée en imagerie ultrasonore médicale 3D et Doppler

    Get PDF
    This thesis is dedicated to the application of the novel compressed sensing theory to the acquisition and reconstruction of 3D US images and Doppler signals. In 3D US imaging, one of the major difficulties concerns the number of RF lines that has to be acquired to cover the complete volume. The acquisition of each line takes an incompressible time due to the finite velocity of the ultrasound wave. One possible solution for increasing the frame rate consists in reducing the acquisition time by skipping some RF lines. The reconstruction of the missing information in post processing is then a typical application of compressed sensing. Another excellent candidate for this theory is the Doppler duplex imaging that implies alternating two modes of emission, one for B-mode imaging and the other for flow estimation. Regarding 3D imaging, we propose a compressed sensing framework using learned overcomplete dictionaries. Such dictionaries allow for much sparser representations of the signals since they are optimized for a particular class of images such as US images.We also focus on the measurement sensing setup and propose a line-wise sampling of entire RF lines which allows to decrease the amount of data and is feasible in a relatively simple setting of the 3D US equipment. The algorithm was validated on 3D simulated and experimental data. For the Doppler application, we proposed a CS based framework for randomly interleaving Doppler and US emissions. The proposed method reconstructs the Doppler signal using a block sparse Bayesian learning algorithm that exploits the correlation structure within a signal and has the ability of recovering partially sparse signals as long as they are correlated. This method is validated on simulated and experimental Doppler data.DL’objectif de cette thĂšse est le dĂ©veloppement de techniques adaptĂ©es Ă  l’application de la thĂ©orie de l’acquisition compressĂ©e en imagerie ultrasonore 3D et Doppler. En imagerie ultrasonore 3D une des principales difficultĂ©s concerne le temps d’acquisition trĂšs long liĂ© au nombre de lignes RF Ă  acquĂ©rir pour couvrir l’ensemble du volume. Afin d’augmenter la cadence d’imagerie une solution possible consiste Ă  choisir alĂ©atoirement des lignes RF qui ne seront pas acquises. La reconstruction des donnĂ©es manquantes est une application typique de l’acquisition compressĂ©e. Une autre application d’intĂ©rĂȘt correspond aux acquisitions Doppler duplex oĂč des stratĂ©gies d’entrelacement des acquisitions sont nĂ©cessaires et conduisent donc Ă  une rĂ©duction de la quantitĂ© de donnĂ©es disponibles. Dans ce contexte, nous avons rĂ©alisĂ© de nouveaux dĂ©veloppements permettant l’application de l’acquisition compressĂ©e Ă  ces deux modalitĂ©s d’acquisition ultrasonore. Dans un premier temps, nous avons proposĂ© d’utiliser des dictionnaires redondants construits Ă  partir des signaux d’intĂ©rĂȘt pour la reconstruction d’images 3D ultrasonores. Une attention particuliĂšre a aussi Ă©tĂ© apportĂ©e Ă  la configuration du systĂšme d’acquisition et nous avons choisi de nous concentrer sur un Ă©chantillonnage des lignes RF entiĂšres, rĂ©alisable en pratique de façon relativement simple. Cette mĂ©thode est validĂ©e sur donnĂ©es 3D simulĂ©es et expĂ©rimentales. Dans un deuxiĂšme temps, nous proposons une mĂ©thode qui permet d’alterner de maniĂšre alĂ©atoire les Ă©missions Doppler et les Ă©missions destinĂ©es Ă  l’imagerie mode-B. La technique est basĂ©e sur une approche bayĂ©sienne qui exploite la corrĂ©lation et la parcimonie des blocs du signal. L’algorithme est validĂ© sur des donnĂ©es Doppler simulĂ©es et expĂ©rimentales

    Hierarchical feature extraction from spatiotemporal data for cyber-physical system analytics

    Get PDF
    With the advent of ubiquitous sensing, robust communication and advanced computation, data-driven modeling is increasingly becoming popular for many engineering problems. Eliminating diïŹƒculties of physics-based modeling, avoiding simplifying assumptions and ad hoc empirical models are signiïŹcant among many advantages of data-driven approaches, especially for large-scale complex systems. While classical statistics and signal processing algorithms have been widely used by the engineering community, advanced machine learning techniques have not been suïŹƒciently explored in this regard. This study summarizes various categories of machine learning tools that have been applied or may be a candidate for addressing engineering problems. While there are increasing number of machine learning algorithms, the main steps involved in applying such techniques to the problems consist in: data collection and pre-processing, feature extraction, model training and inference for decision-making. To support decision-making processes in many applications, hierarchical feature extraction is key. Among various feature extraction principles, recent studies emphasize hierarchical approaches of extracting salient features that is carried out at multiple abstraction levels from data. In this context, the focus of the dissertation is towards developing hierarchical feature extraction algorithms within the framework of machine learning in order to solve challenging cyber-physical problems in various domains such as electromechanical systems and agricultural systems. Furthermore, the feature extraction techniques are described using the spatial, temporal and spatiotemporal data types collected from the systems. The wide applicability of such features in solving some selected real-life domain problems are demonstrated throughout this study
    corecore