32 research outputs found

    Multidimensional Data Visual Exploration by Interactive Information Segments

    Get PDF
    Visualization techniques provide an outstanding role in KDD process for data analysis and mining. However, one image does not always convey successfully the inherent information from high dimensionality, very large databases. In this paper we introduce VSIS (Visual Set of Information Segments), an interactive tool to visually explore multidimensional, very large, numerical data. Within the supervised learning, our proposal approaches the problem of classification by searching of meaningful intervals belonging to the most relevant attributes. These intervals are displayed as multi–colored bars in which the degree of impurity with respect to the class membership can be easily perceived. Such bars can be re–explored interactively with new values of user–defined parameters. A case study of applying VSIS to some UCI repository data sets shows the usefulness of our tool in supporting the exploration of multidimensional and very large data

    Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Full text link
    Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets show that the proposed approach performs better than existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data

    Statistical strategies for avoiding false discoveries in metabolomics and related experiments

    Full text link

    Genome-wide association study in frontal fibrosing alopecia identifies four susceptibility loci including HLA-B*07:02

    Get PDF
    Frontal fibrosing alopecia (FFA) is a recently described inflammatory and scarring type of hair loss affecting almost exclusively women. Despite a dramatic recent increase in incidence the aetiopathogenesis of FFA remains unknown. We undertake genome-wide association studies in females from a UK cohort, comprising 844 cases and 3,760 controls, a Spanish cohort of 172 cases and 385 controls, and perform statistical meta-analysis. We observe genome-wide significant association with FFA at four genomic loci: 2p22.2, 6p21.1, 8q24.22 and 15q2.1. Within the 6p21.1 locus, fine-mapping indicates that the association is driven by the HLA-B*07:02 allele. At 2p22.1, we implicate a putative causal missense variant in CYP1B1, encoding the homonymous xenobiotic- and hormone-processing enzyme. Transcriptomic analysis of affected scalp tissue highlights overrepresentation of transcripts encoding components of innate and adaptive immune response pathways. These findings provide insight into disease pathogenesis and characterise FFA as a genetically predisposed immuno-inflammatory disorder driven by HLA-B*07:02

    Hifocon: Object and dimensional coherence and correlation in multidimensional visualization

    No full text
    Abstract. In any multidimensional visualization, some information has to be compromised when projecting multidimensional data to two- or three-dimensional space. We introduce the concepts of dimensional and object coherence and correlation to analyze and classify multidimensional visualization techniques. These concepts are used as principles for our design of Hifocon, a new multidimensional data visualization system.
    corecore