119,228 research outputs found

    Iterative class discovery and feature selection using Minimal Spanning Trees

    Get PDF
    BACKGROUND: Clustering is one of the most commonly used methods for discovering hidden structure in microarray gene expression data. Most current methods for clustering samples are based on distance metrics utilizing all genes. This has the effect of obscuring clustering in samples that may be evident only when looking at a subset of genes, because noise from irrelevant genes dominates the signal from the relevant genes in the distance calculation. RESULTS: We describe an algorithm for automatically detecting clusters of samples that are discernable only in a subset of genes. We use iteration between Minimal Spanning Tree based clustering and feature selection to remove noise genes in a step-wise manner while simultaneously sharpening the clustering. Evaluation of this algorithm on synthetic data shows that it resolves planted clusters with high accuracy in spite of noise and the presence of other clusters. It also shows a low probability of detecting spurious clusters. Testing the algorithm on some well known micro-array data-sets reveals known biological classes as well as novel clusters. CONCLUSIONS: The iterative clustering method offers considerable improvement over clustering in all genes. This method can be used to discover partitions and their biological significance can be determined by comparing with clinical correlates and gene annotations. The MATLAB(© )programs for the iterative clustering algorithm are available fro

    Improved processing of microarray data using image reconstruction techniques

    Get PDF
    Spotted cDNA microarray data analysis suffers from various problems such as noise from a variety of sources, missing data, inconsistency, and, of course, the presence of outliers. This paper introduces a new method that dramatically reduces the noise when processing the original image data. The proposed approach recreates the microarray slide image, as it would have been with all the genes removed. By subtracting this background recreation from the original, the gene ratios can be calculated with more precision and less influence from outliers and other artifacts that would normally make the analysis of this data more difficult. The new technique is also beneficial, as it does not rely on the accurate fitting of a region to each gene, with its only requirement being an approximate coordinate. In experiments conducted, the new method was tested against one of the mainstream methods of processing spotted microarray images. Our method is shown to produce much less variation in gene measurements. This evidence is supported by clustering results that show a marked improvement in accuracy

    Manifold Alignment Aware Ants:a Markovian process for manifold extraction

    Get PDF
    Dimensionality reduction and clustering are often used as preliminary steps for many complex machine learning tasks. The presence of noise and outliers can deteriorate the performance of such preprocessing and therefore impair the subsequent analysis tremendously. In manifold learning, several studies indicate solutions for removing background noise or noise close to the structure when the density is substantially higher than that exhibited by the noise. However, in many applications, including astronomical datasets, the density varies alongside manifolds that are buried in a noisy background. We propose a novel method to extract manifolds in the presence of noise based on the idea of Ant colony optimization. In contrast to the existing random walk solutions, our technique captures points which are locally aligned with major directions of the manifold. Moreover, we empirically show that the biologically inspired formulation of ant pheromone reinforces this behavior enabling it to recover multiple manifolds embedded in extremely noisy data clouds. The algorithm's performance is demonstrated in comparison to the state-of-the-art approaches, such as Markov Chain, LLPD, and Disperse, on several synthetic and real astronomical datasets stemming from an N-body simulation of a cosmological volum

    Lensing Studies with Diffuse Backgrounds

    Full text link
    The current weak lensing measurements of the large scale structure are mostly related to statistical study of background galaxy ellipticities. We consider a possibility to extend lensing studies with intrinsically unresolved sources and suggest that spatial fluctuations in the integrated diffuse emission from these sources can be used for a lensing reconstruction. Examples of upcoming possibilities include the diffuse background generated by dusty starburst galaxies at far-infrared wavelengths, first stars and galaxies in near-infrared wavelengths, and the background related to 21 cm emission by neutral gas in the general intergalactic medium prior to reionization. While methods developed to extract lensing information from cosmic microwave background (CMB) temperature and polarization data can be easily modified to study lensing properties using diffuse backgrounds at other wavelengths, we suggest that the lensing extraction from these backgrounds using higher order non-Gaussian clustering information alone may not be the best approach. In contrast to CMB anisotropies, reasons for this include the lack of features in the clustering power spectrum such that the resulting lensing modification to the angular power spectrum of low-redshift diffuse backgrounds, at arcminute angular scales, is insignificant. While the use of low redshift backgrounds for lensing studies will be challenging, due to confusing foregrounds among other reasons, the use of suggested backgrounds will extend the reconstruction of the integrated matter power spectrum out to redshifts of 15 to 30, and will bridge the gap between current and upcoming galaxy lensing studies out to, at most, a redshift of a few and planned weak lensing studies with CMB out to the last scattering surface at a redshift of 1100.Comment: 25 pages, 4 figure

    Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering

    Get PDF
    This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering, which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that the proposed algorithm outperformed conventional methods

    Metastable Clusters and Channels Formed by Active Particles with Aligning Interactions

    Full text link
    We introduce a novel model for active particles with short-range aligning interactions and study their behaviour in crowded environments using numerical simulations. When only active particles are present, we observe a transition from a gaseous state to the emergence of metastable clusters as the level of orientational noise is reduced. When also passive particle are present, we observe the emergence of a network of metastable channels.Comment: 11 pages, 7 figure

    The Large-Scale Structure of the X-ray Background and its Cosmological Implications

    Get PDF
    A careful analysis of the HEAO1 A2 2-10 keV full-sky map of the X-ray background (XRB) reveals clustering on the scale of several degrees. After removing the contribution due to beam smearing, the intrinsic clustering of the background is found to be consistent with an auto-correlation function of the form (3.6 +- 0.9) x 10^{-4} theta^{-1} where theta is measured in degrees. If current AGN models of the hard XRB are reasonable and the cosmological constant-cold dark matter cosmology is correct, this clustering implies an X-ray bias factor of b_X ~ 2. Combined with the absence of a correlation between the XRB and the cosmic microwave background, this clustering can be used to limit the presence of an integrated Sachs-Wolfe (ISW) effect and thereby to constrain the value of the cosmological constant, Omega_Lambda < 0.60 (95 % C.L.). This constraint is inconsistent with much of the parameter space currently favored by other observations. Finally, we marginally detect the dipole moment of the diffuse XRB and find it to be consistent with the dipole due to our motion with respect to the mean rest frame of the XRB. The limit on the amplitude of any intrinsic dipole is delta I / I < 5 x 10^{-3} at the 95 % C.L. When compared to the local bulk velocity, this limit implies a constraint on the matter density of the universe of Omega_m^{0.6}/b_X(0) > 0.24.Comment: 15 pages, 8 postscript figures, to appear in the Astrophysical Journal. The postscript version appears not to print, so use the PDF versio

    New measurements of the cosmic infrared background fluctuations in deep Spitzer/IRAC survey data and their cosmological implications

    Get PDF
    We extend previous measurements of cosmic infrared background (CIB) fluctuations to ~ 1 deg using new data from the Spitzer Extended Deep Survey. Two fields, with depths of ~12 hr/pixel over 3 epochs, are analyzed at 3.6 and 4.5 mic. Maps of the fields were assembled using a self-calibration method uniquely suitable for probing faint diffuse backgrounds. Resolved sources were removed from the maps to a magnitude limit of AB mag ~ 25, as indicated by the level of the remaining shot noise. The maps were then Fourier-transformed and their power spectra were evaluated. Instrumental noise was estimated from the time-differenced data, and subtracting this isolates the spatial fluctuations of the actual sky. The power spectra of the source-subtracted fields remain identical (within the observational uncertainties) for the three epochs indicating that zodiacal light contributes negligibly to the fluctuations. Comparing to 8 mic power spectra shows that Galactic cirrus cannot account for the fluctuations. The signal appears isotropically distributed on the sky as required for an extragalactic origin. The CIB fluctuations continue to diverge to > 10 times those of known galaxy populations on angular scales out to < 1 deg. The low shot noise levels remaining in the diffuse maps indicate that the large scale fluctuations arise from the spatial clustering of faint sources well below the confusion noise. The spatial spectrum of these fluctuations is in reasonable agreement with an origin in populations clustered according to the standard cosmological model (LCDM) at epochs coinciding with the first stars era.Comment: ApJ, to be publishe

    Intensity mapping with neutral hydrogen and the Hidden Valley simulations

    Full text link
    This paper introduces the Hidden Valley simulations, a set of trillion-particle N-body simulations in gigaparsec volumes aimed at intensity mapping science. We present details of the simulations and their convergence, then specialize to the study of 21-cm fluctuations between redshifts 2 and 6. Neutral hydrogen is assigned to halos using three prescriptions, and we investigate the clustering in real and redshift-space at the 2-point level. In common with earlier work we find the bias of HI increases from near 2 at z = 2 to 4 at z = 6, becoming more scale dependent at high z. The level of scale-dependence and decorrelation with the matter field are as predicted by perturbation theory. Due to the low mass of the hosting halos, the impact of fingers of god is small on the range relevant for proposed 21-cm instruments. We show that baryon acoustic oscillations and redshift-space distortions could be well measured by such instruments. Taking advantage of the large simulation volume, we assess the impact of fluctuations in the ultraviolet background, which change HI clustering primarily at large scales.Comment: 36 pages, 21 figures. Simulations available at http://cyril.astro.berkeley.edu/HiddenValley/ Minor changes in HI normalization described in footnote of section
    • …
    corecore