1,951 research outputs found
Temporal patterns of gene expression via nonmetric multidimensional scaling analysis
Motivation: Microarray experiments result in large scale data sets that
require extensive mining and refining to extract useful information. We have
been developing an efficient novel algorithm for nonmetric multidimensional
scaling (nMDS) analysis for very large data sets as a maximally unsupervised
data mining device. We wish to demonstrate its usefulness in the context of
bioinformatics. In our motivation is also an aim to demonstrate that
intrinsically nonlinear methods are generally advantageous in data mining.
Results: The Pearson correlation distance measure is used to indicate the
dissimilarity of the gene activities in transcriptional response of cell
cycle-synchronized human fibroblasts to serum [Iyer et al., Science vol. 283,
p83 (1999)]. These dissimilarity data have been analyzed with our nMDS
algorithm to produce an almost circular arrangement of the genes. The temporal
expression patterns of the genes rotate along this circular arrangement. If an
appropriate preparation procedure may be applied to the original data set,
linear methods such as the principal component analysis (PCA) could achieve
reasonable results, but without data preprocessing linear methods such as PCA
cannot achieve a useful picture. Furthermore, even with an appropriate data
preprocessing, the outcomes of linear procedures are not as clearcut as those
by nMDS without preprocessing.Comment: 11 pages, 6 figures + online only 2 color figures, submitted to
Bioinformatic
Recommended from our members
Dispersal in microbes: fungi in indoor air are dominated by outdoor air and show dispersal limitation at short distances.
The indoor microbiome is a complex system that is thought to depend on dispersal from the outdoor biome and the occupants' microbiome combined with selective pressures imposed by the occupants' behaviors and the building itself. We set out to determine the pattern of fungal diversity and composition in indoor air on a local scale and to identify processes behind that pattern. We surveyed airborne fungal assemblages within 1-month time periods at two seasons, with high replication, indoors and outdoors, within and across standardized residences at a university housing facility. Fungal assemblages indoors were diverse and strongly determined by dispersal from outdoors, and no fungal taxa were found as indicators of indoor air. There was a seasonal effect on the fungi found in both indoor and outdoor air, and quantitatively more fungal biomass was detected outdoors than indoors. A strong signal of isolation by distance existed in both outdoor and indoor airborne fungal assemblages, despite the small geographic scale in which this study was undertaken (<500 m). Moreover, room and occupant behavior had no detectable effect on the fungi found in indoor air. These results show that at the local level, outdoor air fungi dominate the patterning of indoor air. More broadly, they provide additional support for the growing evidence that dispersal limitation, even on small geographic scales, is a key process in structuring the often-observed distance-decay biogeographic pattern in microbial communities
Distances in evidence theory: Comprehensive survey and generalizations
AbstractThe purpose of the present work is to survey the dissimilarity measures defined so far in the mathematical framework of evidence theory, and to propose a classification of these measures based on their formal properties. This research is motivated by the fact that while dissimilarity measures have been widely studied and surveyed in the fields of probability theory and fuzzy set theory, no comprehensive survey is yet available for evidence theory. The main results presented herein include a synthesis of the properties of the measures defined so far in the scientific literature; the generalizations proposed naturally lead to additions to the body of the previously known measures, leading to the definition of numerous new measures. Building on this analysis, we have highlighted the fact that Dempster’s conflict cannot be considered as a genuine dissimilarity measure between two belief functions and have proposed an alternative based on a cosine function. Other original results include the justification of the use of two-dimensional indexes as (cosine; distance) couples and a general formulation for this class of new indexes. We base our exposition on a geometrical interpretation of evidence theory and show that most of the dissimilarity measures so far published are based on inner products, in some cases degenerated. Experimental results based on Monte Carlo simulations illustrate interesting relationships between existing measures
The Metric Nearness Problem
Metric nearness refers to the problem of optimally restoring metric properties to
distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric
data can be important in various settings, for example, in clustering, classification, metric-based
indexing, query processing, and graph theoretic approximation algorithms. This paper formulates
and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set
of distances that satisfy the properties of a metric—principally the triangle inequality. For solving
this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative
projection method. An intriguing aspect of the metric nearness problem is that a special case turns
out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and
develops a new algorithm for the latter problem using a primal-dual method. Applications to graph
clustering are provided as an illustration. We include experiments that demonstrate the computational
superiority of triangle fixing over general purpose convex programming software. Finally, we
conclude by suggesting various useful extensions and generalizations to metric nearness
A Statistical Investigation of Nonmetric Vertebral Traits with a Skeletal Population Sample from the Dakhleh Oasis, Egypt
This paleogenetic study utilizes 17 nonmetric epigenetic vertebral traits to determine their suitability for studying past genetic relationships. The samples utilized were from Egypt’s Dakhleh Oasis. Though infracranial nonmetric traits have a limited role in the study of past population genetics, this study has shown their value for elucidating past genetic patterns for intragroup analysis. The key to their utilization is to test the epigenetic factors (e.g., age, sex, symmetry and intertrait correlations) which were done using a number of statistical tests including Phi coefficient, G-test and the Odds ratio. This study utilized a novel set of spatial statistics to examine within-group genetic dynamics of the Kellis 2 cemetery. Five traits support previous research that demonstrated this cemetery was organized along patrilocal and patrilineal lines. This thesis has demonstrated the genetic value of vertebral epigenetic traits and argues for their continued use in paleogenetic research
CHARACTERIZING BENTHIC MACROINVERTEBRATE COMMUNITY RESPONSES TO NUTRIENT ADDITION USING NMDS AND BACI ANALYSES
Nonmetric multidimensional scaling (NMDS) is an ordination technique which is often used for information visualization and exploring similarities or dissimilarities in ecological data. In principle, NMDS maximizes rank-order correlation between distance measures and distance in the ordination space. Ordination points are adjusted in a manner that minimizes stress, where stress is defined as a measure of the discordance between the two kinds of distances. Before and After Control Impact (BACI) is a classical analysis of variance method for measuring the potential influence of an environmental disturbance. Such effects can be assessed by comparing conditions before and after a planned activity. In certain ecological applications, the extent of the impact is also expressed relative to conditions in a control area, after a particular anthropogenic activity has occurred. In this paper, two statistical techniques are employed to investigate the effects of stream nutrient addition on a riverine benthic macroinvertebrate community. The clustering of sampling units, based on multiple macroinvertebrate metrics across pre-determined river zones, is explored using NMDS. BACI is subsequently used to test for the potential impact of nutrient addition on the specified macroinvertebrate response metrics. The combination of the two approaches provides a powerful and sensitive tool for detecting complex second-order effects in river food chains. Statistical techniques are demonstrated using eight years of benthic macroinvertebrate survey data collected on an ultra-oligotrophic reach of the Kootenai River in Northern Idaho and Western Montana downstream from a hydro-electric dam
- …