39 research outputs found

    Recuperação por conteudo em grandes coleções de imagens heterogeneas

    Get PDF
    Orientador: Alexandre Xavier FalcãoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação CientificaResumo: A recuperação de imagens por conteúdo (CBIR) é uma área que vem recebendo crescente atenção por parte da comunidade científica por causa do crescimento exponencial do número de imagens que vêm sendo disponibilizadas, principalmente na WWW. À medida que cresce o volume de imagens armazenadas, Cresce também o interesse por sistemas capazes de recuperar eficientemente essas imagens a partir do seu conteúdo visual. Nosso trabalho concentrou-se em técnicas que pudessem ser aplicadas em grandes coleções de imagens heterogêneas. Nesse tipo de coleção, não se pode assumir nenhum tipo de conhecimento sobre o conteúdo semântico e ou visual das imagens, e o custo de utilizar técnicas semi-automáticas (com intervenção humana) é alto em virtude do volume e da heterogeneidade das imagens que precisam ser analisadas. Nós nos concentramos na informação de cor presente nas imagens, e enfocamos os três tópicos que consideramos mais importantes para se realizar a recuperação de imagens baseada em cor: (1) como analisar e extrair informação de cor das imagens de forma automática e eficiente; (2) como representar essa informação de forma compacta e efetiva; e (3) como comparar eficientemente as características visuais que descrevem duas imagens. As principais contribuições do nosso trabalho foram dois algoritmos para a análise automática do conteúdo visual das imagens (CBC e BIC), duas funções de distância para a comparação das informações extraídas das imagens (MiCRoM e dLog) e urna representação alternativa para abordagens que decompõem e representam imagens a partir de células de tamanho fixo (CCIf)Abstract: Content-based image retrieval (CBIR) is an area that has received increasing attention from the scientific community due to the exponential growing of available images, mainly at the WWW.This has spurred great interest for systems that are able to efficiently retrieve images according to their visual content. Our work has focused in techniques suitable for broad image domains. ln a broad image domain, it is not possible to assume or use any a p1'ior'i knowledge about the visual content and/or semantic content of the images. Moreover, the cost of using semialitomatic image analysis techniques is prohibitive because of the heterogeneity and the amount of images that must be analyzed. We have directed our work to color-based image retrieval, and have focused on the three main issues that should be addressed in order to achieve color-based image retrieval: (1) how to analyze and describe images in an automatic and efficient way; (2) how to represent the image content in a compact and effective way; and (3) how to efficiently compare the visual features extracted from the images. The main contributions of our work are two algorithms to automatically analyze the visual content of the images (CBC and BIC), two distance functions to compare the visual features extracted from the images (MiCRoM and dLog), and an alteruative representation for CBIR approaches that decompose and represent images according to a grid of equalsized cells (CCH)DoutoradoDoutor em Ciência da Computaçã

    An algorithmic framework for visualising and exploring multidimensional data

    Get PDF
    To help understand multidimensional data, information visualisation techniques are often applied to take advantage of human visual perception in exposing latent structure. A popular means of presenting such data is via two-dimensional scatterplots where the inter-point proximities reflect some notion of similarity between the entities represented. This can result in potentially interesting structure becoming almost immediately apparent. Traditional algorithms for carrying out this dimension reduction tend to have different strengths and weaknesses in terms of run times and layout quality. However, it has been found that the combination of algorithms can produce hybrid variants that exhibit significantly lower run times while maintaining accurate depictions of high-dimensional structure. The author's initial contribution in the creation of such algorithms led to the design and implementation of a software system (HIVE) for the development and investigation of new hybrid variants and the subsequent analysis of the data they transform. This development was motivated by the fact that there are potentially many hybrid algorithmic combinations to explore and therefore an environment that is conductive to their development, analysis and use is beneficial not only in exploring the data they transform but also in exploring the growing number of visualisation tools that these algorithms beget. This thesis descries three areas of the author's contribution to the field of information visualisation. Firstly, work on hybrid algorithms for dimension reduction is presented and their analysis shows their effectiveness. Secondly, the development of a framework for the creation of tailored hybrid algorithms is illustrated. Thirdly, a system embodying the framework, providing an environment conductive to the development, evaluation and use of the algorithms is described. Case studies are provided to demonstrate how the author and others have used and found value in the system across areas as diverse as environmental science, social science and investigative psychology, where multidimensional data are in abundance

    Car Detection by Classification of Image Segments

    Get PDF

    GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data

    Get PDF
    MOTIVATION: Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM's ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a-priori. So far however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. RESULTS: We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimise phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. AVAILABILITY: GAMIBHEAR is available as an R package under the open source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear

    Lossless compression of hyperspectral images

    Get PDF
    Band ordering and the prediction scheme are the two major aspects of hyperspectral imaging which have been studied to improve the performance of the compression system. In the prediction module, we propose spatio-spectral prediction methods. Two non-linear spectral prediction methods have been proposed in this thesis. NPHI (Non-linear Prediction for Hyperspectral Images) is based on a band look-ahead technique wherein a reference band is included in the prediction of pixels in the current band. The prediction technique estimates the variation between the contexts of the two bands to modify the weights computed in the reference band to predict the pixels in the current band. EPHI (Edge-based Prediction for Hyperspectral Images) is the modified NPHI technique wherein an edge-based analysis is used to classify the pixels into edges and non-edges in order to perform the prediction of the pixel in the current band. Three ordering methods have been proposed in this thesis. The first ordering method computes the local and global features in each band to group the bands. The bands in each group are ordered by estimating the compression ratios achieved between the entire band in the group and then ordering them using Kruskal\u27s algorithm. The other two methods of ordering compute the compression ratios between b-neighbors in performing the band ordering

    Methods for Estimation of Intrinsic Dimensionality

    Get PDF
    Dimension reduction is an important tool used to describe the structure of complex data (explicitly or implicitly) through a small but sufficient number of variables, and thereby make data analysis more efficient. It is also useful for visualization purposes. Dimension reduction helps statisticians to overcome the ‘curse of dimensionality’. However, most dimension reduction techniques require the intrinsic dimension of the low-dimensional subspace to be fixed in advance. The availability of reliable intrinsic dimension (ID) estimation techniques is of major importance. The main goal of this thesis is to develop algorithms for determining the intrinsic dimensions of recorded data sets in a nonlinear context. Whilst this is a well-researched topic for linear planes, based mainly on principal components analysis, relatively little attention has been paid to ways of estimating this number for non–linear variable interrelationships. The proposed algorithms here are based on existing concepts that can be categorized into local methods, relying on randomly selected subsets of a recorded variable set, and global methods, utilizing the entire data set. This thesis provides an overview of ID estimation techniques, with special consideration given to recent developments in non–linear techniques, such as charting manifold and fractal–based methods. Despite their nominal existence, the practical implementation of these techniques is far from straightforward. The intrinsic dimension is estimated via Brand’s algorithm by examining the growth point process, which counts the number of points in hyper-spheres. The estimation needs to determine the starting point for each hyper-sphere. In this thesis we provide settings for selecting starting points which work well for most data sets. Additionally we propose approaches for estimating dimensionality via Brand’s algorithm, the Dip method and the Regression method. Other approaches are proposed for estimating the intrinsic dimension by fractal dimension estimation methods, which exploit the intrinsic geometry of a data set. The most popular concept from this family of methods is the correlation dimension, which requires the estimation of the correlation integral for a ball of radius tending to 0. In this thesis we propose new approaches to approximate the correlation integral in this limit. The new approaches are the Intercept method, the Slop method and the Polynomial method. In addition we propose a new approach, a localized global method, which could be defined as a local version of global ID methods. The objective of the localized global approach is to improve the algorithm based on a local ID method, which could significantly reduce the negative bias. Experimental results on real world and simulated data are used to demonstrate the algorithms and compare them to other methodology. A simulation study which verifies the effectiveness of the proposed methods is also provided. Finally, these algorithms are contrasted using a recorded data set from an industrial melter process

    APPLICATION OF IMAGE ANALYSIS TECHNIQUES TO SATELLITE CLOUD MOTION TRACKING

    Get PDF
    Cloud motion wind (CMW) determination requires tracking of individual cloud targets. This is achieved by first clustering and then tracking each cloud cluster. Ideally, different cloud clusters correspond to diiferent pressure levels. Two new clustering techniques have been developed for the identification of cloud types in multi-spectral satellite imagery. The first technique is the Global-Local clustering algorithm. It is a cascade of a histogram clustering algorithm and a dynamic clustering algorithm. The histogram clustering algorithm divides the multi-spectral histogram into'non-overlapped regions, and these regions are used to initialise the dynamic clustering algorithm. The dynamic clustering algorithm assumes clusters have a Gaussian distributed probability density function with diiferent population size and variance. The second technique uses graph theory to exploit the spatial information which is often ignored in per-pixel clustering. The algorithm is in two stages: spatial clustering and spectral clustering. The first stage extracts homogeneous objects in the image using a family of algorithms based on stepwise optimization. This family of algorithms can be further divided into two approaches: Top-down and Bottom-up. The second stage groups similar segments into clusters using a statistical hypothesis test on their similarities. The clusters generated are less noisy along class boundaries and are in hierarchical order. A criterion based on mutual information is derived to monitor the spatial clustering process and to suggest an optimal number of segments. An automated cloud motion tracking program has been developed. Three images (each separated by 30 minutes) are used to track cloud motion and the middle image is clustered using Global-Local clustering prior to tracking. Compared with traditional methods based on raw images, it is found that separation of cloud types before cloud tracking can reduce the ambiguity due to multi-layers of cloud moving at different speeds and direction. Three matching techniques are used and their reliability compared. Target sizes ranging from 4 x 4 to 32 x 32 are tested and their errors compared. The optimum target size for first generation METEOSAT images has also been found.Meteorological Office, Bracknel

    Machine Learning Methods for Flow Cytometry Analysis and Visualization

    Get PDF
    Flow cytometry is a popular analytical cell-biology instrument that uses specific wavelengths of light to profile heterogeneous populations of cells at the individual level. Current cytometers have the capability of analyzing up to 20 parameters on over a million cells, but despite the complexity of these datasets, a typical workflow relies on subjective labor-intensive manual sequential analysis. The research presented in this dissertation provides two machine learning methods to increase the objectivity, efficiency, and discovery in flow cytometry data analysis. The first, a supervised learning method, utilizes previously analyzed data to evaluate new flow cytometry files containing similar parameters. The probability distribution of each dimension in a file is matched to each related dimension of a reference file through color indexing and histogram intersection methods. Once a similar reference file is selected the cell populations previously classified are used to create a tailored support vector machine capable of classifying cell populations as an expert would. This method has produced results highly correlated with manual sequential analysis, providing an efficient alternative for analyzing a large number of samples. The second, a novel unsupervised method, is used to explore and visualize single-cell data in an objective manner. To accomplish this, a hypergraph sampling method was created to preserve rare events within the flow data before divisively clustering the sampled data using singular value decomposition. The unsampled data is added to the discovered set of clusters using a support vector machine classifier, and the final analysis is displayed as a minimum spanning tree. This tree is capable of distinguishing rare subsets of cells comprising of less than 1% of the original data

    Genomic protein functionality classification algorithms in frequency domain.

    Get PDF
    Tak-Chung Lau.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 190-198).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Background Information --- p.4Chapter 1.2 --- Importance of the Problem --- p.6Chapter 1.3 --- Problem Definition and Proposed Algorithm Outline --- p.7Chapter 1.4 --- Simple Illustration --- p.10Chapter 1.5 --- Outline of the Thesis --- p.12Chapter 2 --- Survey --- p.14Chapter 2.1 --- Introduction --- p.14Chapter 2.2 --- Dynamic Programming (DP) --- p.15Chapter 2.2.1 --- Introduction --- p.15Chapter 2.2.2 --- Algorithm --- p.15Chapter 2.2.3 --- Example --- p.16Chapter 2.2.4 --- Complexity Analysis --- p.20Chapter 2.2.5 --- Summary --- p.21Chapter 2.3 --- General Alignment Tools --- p.21Chapter 2.4 --- K-Nearest Neighbor (KNN) --- p.22Chapter 2.4.1 --- Value of K --- p.22Chapter 2.4.2 --- Example --- p.23Chapter 2.4.3 --- Variations in KNN --- p.24Chapter 2.4.4 --- Summary --- p.24Chapter 2.5 --- Decision Tree --- p.25Chapter 2.5.1 --- General Information of Decision Tree --- p.25Chapter 2.5.2 --- Classification in Decision Tree --- p.26Chapter 2.5.3 --- Disadvantages in Decision Tree --- p.27Chapter 2.5.4 --- Comparison on Different Types of Trees --- p.28Chapter 2.5.5 --- Conclusion --- p.29Chapter 2.6 --- Hidden Markov Model (HMM) --- p.29Chapter 2.6.1 --- Markov Process --- p.29Chapter 2.6.2 --- Hidden Markov Model --- p.31Chapter 2.6.3 --- General Framework in HMM --- p.32Chapter 2.6.4 --- Example --- p.34Chapter 2.6.5 --- Drawbacks in HMM --- p.35Chapter 2.7 --- Chapter Summary --- p.36Chapter 3 --- Related Work --- p.37Chapter 3.1 --- Resonant Recognition Model (RRM) --- p.37Chapter 3.1.1 --- Introduction --- p.37Chapter 3.1.2 --- Encoding Stage --- p.39Chapter 3.1.3 --- Transformation Stage --- p.41Chapter 3.1.4 --- Evaluation Stage --- p.43Chapter 3.1.5 --- Important Conclusion in RRM --- p.47Chapter 3.1.6 --- Summary --- p.48Chapter 3.2 --- Motivation --- p.49Chapter 3.2.1 --- Example --- p.51Chapter 3.3 --- Chapter Summary --- p.53Chapter 4 --- Group Classification --- p.54Chapter 4.1 --- Introduction --- p.54Chapter 4.2 --- Design --- p.55Chapter 4.2.1 --- Data Preprocessing --- p.55Chapter 4.2.2 --- Encoding Stage --- p.58Chapter 4.2.3 --- Transformation Stage --- p.63Chapter 4.2.4 --- Evaluation Stage --- p.64Chapter 4.2.5 --- Classification --- p.72Chapter 4.2.6 --- Summary --- p.75Chapter 4.3 --- Experimental Settings --- p.75Chapter 4.3.1 --- "Statistics from Database of Secondary Structure in Pro- teins (DSSP) [27], [54]" --- p.76Chapter 4.3.2 --- Parameters Used --- p.77Chapter 4.3.3 --- Experimental Procedure --- p.79Chapter 4.4 --- Experimental Results --- p.79Chapter 4.4.1 --- Reference Group - Neurotoxin --- p.80Chapter 4.4.2 --- Reference Group - Biotin --- p.82Chapter 4.4.3 --- Average Results of all the Groups --- p.84Chapter 4.4.4 --- Conclusion in Experimental Results --- p.88Chapter 4.5 --- Discussion --- p.89Chapter 4.5.1 --- Discussion on the Experimental Results --- p.89Chapter 4.5.2 --- Complexity Analysis --- p.94Chapter 4.5.3 --- Other Discussion --- p.99Chapter 4.6 --- Chapter Summary --- p.102Chapter 5 --- Individual Classification --- p.103Chapter 5.1 --- Design --- p.103Chapter 5.1.1 --- Group Profile Generation --- p.104Chapter 5.1.2 --- Preparation of Each Testing Examples --- p.104Chapter 5.2 --- Design with Clustering --- p.104Chapter 5.2.1 --- Motivation --- p.105Chapter 5.2.2 --- Data Exception --- p.105Chapter 5.2.3 --- Clustering Technique --- p.110Chapter 5.2.4 --- Classification --- p.116Chapter 5.3 --- Hybridization of Our Approach and Sequence Alignment --- p.116Chapter 5.3.1 --- AlignRemove and AlignChange --- p.117Chapter 5.3.2 --- Classification --- p.119Chapter 5.4 --- Experimental Settings --- p.120Chapter 5.4.1 --- Parameters Used --- p.120Chapter 5.4.2 --- Choosing of Protein Functional Groups --- p.121Chapter 5.5 --- Experimental Results --- p.122Chapter 5.5.1 --- Experimental Results Setup --- p.122Chapter 5.5.2 --- Receiver Operating Characteristics (ROC) Curves --- p.123Chapter 5.5.3 --- Interpretation of Comparison Results --- p.125Chapter 5.5.4 --- Area under the Curve --- p.138Chapter 5.5.5 --- Classification with KNN --- p.141Chapter 5.5.6 --- Three Types of KNN --- p.142Chapter 5.5.7 --- Results in Three Types of KNN --- p.143Chapter 5.6 --- Complexity Analysis --- p.144Chapter 5.6.1 --- Complexity in Individual Classification --- p.144Chapter 5.6.2 --- Complexity in Individual Clustering Classification --- p.146Chapter 5.6.3 --- Complexity of Individual Classification in DP --- p.148Chapter 5.6.4 --- Conclusion --- p.148Chapter 5.7 --- Discussion --- p.149Chapter 5.7.1 --- Domain Expert Opinions --- p.149Chapter 5.7.2 --- Choosing the Threshold --- p.149Chapter 5.7.3 --- Statistical Support in an Individual Protein --- p.150Chapter 5.7.4 --- Discussion on Clustering --- p.151Chapter 5.7.5 --- Poor Performance in Hybridization --- p.154Chapter 5.8 --- Chapter Summary --- p.155Chapter 6 --- Application --- p.157Chapter 6.1 --- Introduction --- p.157Chapter 6.1.1 --- Construct the Correlation Graph --- p.157Chapter 6.1.2 --- Minimum Spanning Tree (MST) --- p.161Chapter 6.2 --- Application in Group Classification --- p.164Chapter 6.2.1 --- Groups with Weak Relationship --- p.164Chapter 6.2.2 --- Groups with Strong Relationship --- p.166Chapter 6.3 --- Application in Individual Classification --- p.168Chapter 6.4 --- Chapter Summary --- p.171Chapter 7 --- Discussion on Other Analysis --- p.172Chapter 7.1 --- Distanced MLN Encoding Scheme --- p.172Chapter 7.2 --- Unique Encoding Method --- p.174Chapter 7.3 --- Protein with Multiple Functions? --- p.175Chapter 7.4 --- Discussion on Sequence Similarity --- p.176Chapter 7.5 --- Functional Blocks in Proteins --- p.177Chapter 7.6 --- Issues in DSSP --- p.178Chapter 7.7 --- Flexible Encoding --- p.179Chapter 7.8 --- Advantages over Dynamic Programming --- p.179Chapter 7.9 --- Novel Research Direction --- p.180Chapter 8 --- Future Works --- p.182Chapter 8.1 --- Improvement in Encoding Scheme --- p.182Chapter 8.2 --- Analysis on Primary Protein Sequences --- p.183Chapter 8.3 --- In Between Spectrum Scaling --- p.184Chapter 8.4 --- Improvement in Hybridization --- p.185Chapter 8.5 --- Fuzzy Threshold Boundaries --- p.185Chapter 8.6 --- Optimal Parameters Setting --- p.186Chapter 8.7 --- Generalization Tool --- p.187Chapter 9 --- Conclusion --- p.188Bibliography --- p.190Chapter A --- Fourier Transform --- p.199Chapter A.1 --- Introduction --- p.199Chapter A.2 --- Example --- p.201Chapter A.3 --- Physical Meaning of Fourier Transform --- p.20
    corecore