1,951 research outputs found

    Temporal patterns of gene expression via nonmetric multidimensional scaling analysis

    Full text link
    Motivation: Microarray experiments result in large scale data sets that require extensive mining and refining to extract useful information. We have been developing an efficient novel algorithm for nonmetric multidimensional scaling (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics. In our motivation is also an aim to demonstrate that intrinsically nonlinear methods are generally advantageous in data mining. Results: The Pearson correlation distance measure is used to indicate the dissimilarity of the gene activities in transcriptional response of cell cycle-synchronized human fibroblasts to serum [Iyer et al., Science vol. 283, p83 (1999)]. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular arrangement of the genes. The temporal expression patterns of the genes rotate along this circular arrangement. If an appropriate preparation procedure may be applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clearcut as those by nMDS without preprocessing.Comment: 11 pages, 6 figures + online only 2 color figures, submitted to Bioinformatic

    Distances in evidence theory: Comprehensive survey and generalizations

    Get PDF
    AbstractThe purpose of the present work is to survey the dissimilarity measures defined so far in the mathematical framework of evidence theory, and to propose a classification of these measures based on their formal properties. This research is motivated by the fact that while dissimilarity measures have been widely studied and surveyed in the fields of probability theory and fuzzy set theory, no comprehensive survey is yet available for evidence theory. The main results presented herein include a synthesis of the properties of the measures defined so far in the scientific literature; the generalizations proposed naturally lead to additions to the body of the previously known measures, leading to the definition of numerous new measures. Building on this analysis, we have highlighted the fact that Dempster’s conflict cannot be considered as a genuine dissimilarity measure between two belief functions and have proposed an alternative based on a cosine function. Other original results include the justification of the use of two-dimensional indexes as (cosine; distance) couples and a general formulation for this class of new indexes. We base our exposition on a geometrical interpretation of evidence theory and show that most of the dissimilarity measures so far published are based on inner products, in some cases degenerated. Experimental results based on Monte Carlo simulations illustrate interesting relationships between existing measures

    The Metric Nearness Problem

    Get PDF
    Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness

    Multidimensional scaling

    Get PDF

    A Statistical Investigation of Nonmetric Vertebral Traits with a Skeletal Population Sample from the Dakhleh Oasis, Egypt

    Get PDF
    This paleogenetic study utilizes 17 nonmetric epigenetic vertebral traits to determine their suitability for studying past genetic relationships. The samples utilized were from Egypt’s Dakhleh Oasis. Though infracranial nonmetric traits have a limited role in the study of past population genetics, this study has shown their value for elucidating past genetic patterns for intragroup analysis. The key to their utilization is to test the epigenetic factors (e.g., age, sex, symmetry and intertrait correlations) which were done using a number of statistical tests including Phi coefficient, G-test and the Odds ratio. This study utilized a novel set of spatial statistics to examine within-group genetic dynamics of the Kellis 2 cemetery. Five traits support previous research that demonstrated this cemetery was organized along patrilocal and patrilineal lines. This thesis has demonstrated the genetic value of vertebral epigenetic traits and argues for their continued use in paleogenetic research

    CHARACTERIZING BENTHIC MACROINVERTEBRATE COMMUNITY RESPONSES TO NUTRIENT ADDITION USING NMDS AND BACI ANALYSES

    Get PDF
    Nonmetric multidimensional scaling (NMDS) is an ordination technique which is often used for information visualization and exploring similarities or dissimilarities in ecological data. In principle, NMDS maximizes rank-order correlation between distance measures and distance in the ordination space. Ordination points are adjusted in a manner that minimizes stress, where stress is defined as a measure of the discordance between the two kinds of distances. Before and After Control Impact (BACI) is a classical analysis of variance method for measuring the potential influence of an environmental disturbance. Such effects can be assessed by comparing conditions before and after a planned activity. In certain ecological applications, the extent of the impact is also expressed relative to conditions in a control area, after a particular anthropogenic activity has occurred. In this paper, two statistical techniques are employed to investigate the effects of stream nutrient addition on a riverine benthic macroinvertebrate community. The clustering of sampling units, based on multiple macroinvertebrate metrics across pre-determined river zones, is explored using NMDS. BACI is subsequently used to test for the potential impact of nutrient addition on the specified macroinvertebrate response metrics. The combination of the two approaches provides a powerful and sensitive tool for detecting complex second-order effects in river food chains. Statistical techniques are demonstrated using eight years of benthic macroinvertebrate survey data collected on an ultra-oligotrophic reach of the Kootenai River in Northern Idaho and Western Montana downstream from a hydro-electric dam
    corecore