2,375 research outputs found

    Multidimensional Balance-Based Cluster Boundary Detection for High-Dimensional Data

    Full text link
    © 2018 IEEE. The balance of neighborhood space around a central point is an important concept in cluster analysis. It can be used to effectively detect cluster boundary objects. The existing neighborhood analysis methods focus on the distribution of data, i.e., analyzing the characteristic of the neighborhood space from a single perspective, and could not obtain rich data characteristics. In this paper, we analyze the high-dimensional neighborhood space from multiple perspectives. By simulating each dimension of a data point's k nearest neighbors space (k NNs) as a lever, we apply the lever principle to compute the balance fulcrum of each dimension after proving its inevitability and uniqueness. Then, we model the distance between the projected coordinate of the data point and the balance fulcrum on each dimension and construct the DHBlan coefficient to measure the balance of the neighborhood space. Based on this theoretical model, we propose a simple yet effective cluster boundary detection algorithm called Lever. Experiments on both low- and high-dimensional data sets validate the effectiveness and efficiency of our proposed algorithm

    Clustering of star-forming galaxies detected in mid-infrared with the Spitzer wide-area survey

    Full text link
    We discuss the clustering properties of galaxies with signs of ongoing star formation detected by the Spitzer Space Telescope at 24mum band in the SWIRE Lockman Hole field. The sample of mid-IR-selected galaxies includes ~20,000 objects detected above a flux threshold of S24mum=310muJy. We adopt optical/near-IR color selection criteria to split the sample into the lower-redshift and higher-redshift galaxy populations. We measure the angular correlation function on scales of theta=0.01-3.5 deg, from which, using the Limber inversion along with the redshift distribution established for similarly selected source populations in the GOODS fields (Rodighiero et al. 2010), we obtain comoving correlation lengths of r0=4.98+-0.28 h^-1 Mpc and r0 =8.04+-0.69 h^-1 Mpc for the low-z (=0.7) and high-z (=1.7) subsamples, respectively. Comparing these measurements with the correlation functions of dark matter halos identified in the Bolshoi cosmological simulation (Klypin et al. 2011}, we find that the high-redshift objects reside in progressively more massive halos reaching Mtot>3e12 h^-1 Msun, compared to Mtot>7e11 h^-1 Msun for the low-redshift population. Approximate estimates of the IR luminosities based on the catalogs of 24mum sources in the GOODS fields show that our high-z subsample represents a population of "distant ULIRGs" with LIR>10^12Lsun, while the low-z subsample mainly consists of "LIRGs", LIR~10^11Lsun. The comparison of number density of the 24mum selected galaxies and of dark matter halos with derived minimum mass Mtot shows that only 20% of such halos may host star-forming galaxies.Comment: 15 pages, 12 figure

    Galaxy alignments: An overview

    Full text link
    The alignments between galaxies, their underlying matter structures, and the cosmic web constitute vital ingredients for a comprehensive understanding of gravity, the nature of matter, and structure formation in the Universe. We provide an overview on the state of the art in the study of these alignment processes and their observational signatures, aimed at a non-specialist audience. The development of the field over the past one hundred years is briefly reviewed. We also discuss the impact of galaxy alignments on measurements of weak gravitational lensing, and discuss avenues for making theoretical and observational progress over the coming decade.Comment: 43 pages excl. references, 16 figures; minor changes to match version published in Space Science Reviews; part of a topical volume on galaxy alignments, with companion papers at arXiv:1504.05546 and arXiv:1504.0546

    Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science

    Full text link
    The purpose of the New York Workshop on Computer, Earth and Space Sciences is to bring together the New York area's finest Astronomers, Statisticians, Computer Scientists, Space and Earth Scientists to explore potential synergies between their respective fields. The 2011 edition (CESS2011) was a great success, and we would like to thank all of the presenters and participants for attending. This year was also special as it included authors from the upcoming book titled "Advances in Machine Learning and Data Mining for Astronomy". Over two days, the latest advanced techniques used to analyze the vast amounts of information now available for the understanding of our universe and our planet were presented. These proceedings attempt to provide a small window into what the current state of research is in this vast interdisciplinary field and we'd like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011 in New York City, Goddard Institute for Space Studie

    A divide-and-conquer approach to geometric sampling for active learning

    Full text link
    Active learning (AL) repeatedly trains the classifier with the minimum labeling budget to improve the current classification model. The training process is usually supervised by an uncertainty evaluation strategy. However, the uncertainty evaluation always suffers from performance degeneration when the initial labeled set has insufficient labels. To completely eliminate the dependence on the uncertainty evaluation sampling in AL, this paper proposes a divide-and-conquer idea that directly transfers the AL sampling as the geometric sampling over the clusters. By dividing the points of the clusters into cluster boundary and core points, we theoretically discuss their margin distance and {hypothesis relationship}. With the advantages of cluster boundary points in the above two properties, we propose a Geometric Active Learning (GAL) algorithm by knight's tour. Experimental studies of the two reported experimental tasks including cluster boundary detection and AL classification show that the proposed GAL method significantly outperforms the state-of-the-art baselines.Comment: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updatin

    Clustering of the AKARI NEP deep field 24<i>μ</i>m selected galaxies

    Get PDF
    Aims. We present a method of selection of 24 μm galaxies from the AKARI north ecliptic pole (NEP) deep field down to 150 μJy and measurements of their two-point correlation function. We aim to associate various 24 μm selected galaxy populations with present day galaxies and to investigate the impact of their environment on the direction of their subsequent evolution. Methods. We discuss using of Support Vector Machines (SVM) algorithm applied to infrared photometric data to perform star-galaxy separation, in which we achieve an accuracy higher than 80%. The photometric redshift information, obtained through the CIGALE code, is used to explore the redshift dependence of the correlation function parameter (r0) as well as the linear bias evolution. This parameter relates galaxy distribution to the one of the underlying dark matter. We connect the investigated sources to their potential local descendants through a simplified model of the clustering evolution without interactions. Results. We observe two different populations of star-forming galaxies, at zmed ∼ 0.25, zmed ∼ 0.9. Measurements of total infrared luminosities (LTIR) show that the sample at zmed ∼ 0.25 is composed mostly of local star-forming galaxies, while the sample at zmed ∼ 0.9 is composed of luminous infrared galaxies (LIRGs) with LTIR ∼ 1011.62 L⨀. We find that dark halo mass is not necessarily correlated with the LTIR: for subsamples with LTIR = 1011.15 L⨀ at zmed ∼ 0.7 we observe a higher clustering length (r0 = 6.21 ± 0.78 [h−1Mpc]) than for a subsample with mean LTIR = 1011.84 L⨀ at zmed ∼ 1.1 (r0 = 5.86 ± 0.69 h−1Mpc). We find that galaxies at zmed ∼ 0.9 can be ancestors of present day L∗ early type galaxies, which exhibit a very high r0 ∼ 8h−1 Mpc.</p

    Integrated Multiparametric Radiomics and Informatics System for Characterizing Breast Tumor Characteristics with the OncotypeDX Gene Assay

    Get PDF
    Optimal use of multiparametric magnetic resonance imaging (mpMRI) can identify key MRI parameters and provide unique tissue signatures defining phenotypes of breast cancer. We have developed and implemented a new machine-learning informatic system, termed Informatics Radiomics Integration System (IRIS) that integrates clinical variables, derived from imaging and electronic medical health records (EHR) with multiparametric radiomics (mpRad) for identifying potential risk of local or systemic recurrence in breast cancer patients. We tested the model in patients (n = 80) who had Estrogen Receptor positive disease and underwent OncotypeDX gene testing, radiomic analysis, and breast mpMRI. The IRIS method was trained using the mpMRI, clinical, pathologic, and radiomic descriptors for prediction of the OncotypeDX risk score. The trained mpRad IRIS model had a 95% and specificity was 83% with an Area Under the Curve (AUC) of 0.89 for classifying low risk patients from the intermediate and high-risk groups. The lesion size was larger for the high-risk group (2.9 ± 1.7 mm) and lower for both low risk (1.9 ± 1.3 mm) and intermediate risk (1.7 ± 1.4 mm) groups. The lesion apparent diffusion coefficient (ADC) map values for high- and intermediate-risk groups were significantly (p \u3c 0.05) lower than the low-risk group (1.14 vs. 1.49 × 10−3 mm2/s). These initial studies provide deeper insight into the clinical, pathological, quantitative imaging, and radiomic features, and provide the foundation to relate these features to the assessment of treatment response for improved personalized medicine

    Past and present cosmic structure in the SDSS DR7 main sample

    Full text link
    We present a chrono-cosmography project, aiming at the inference of the four dimensional formation history of the observed large scale structure from its origin to the present epoch. To do so, we perform a full-scale Bayesian analysis of the northern galactic cap of the Sloan Digital Sky Survey (SDSS) Data Release 7 main galaxy sample, relying on a fully probabilistic, physical model of the non-linearly evolved density field. Besides inferring initial conditions from observations, our methodology naturally and accurately reconstructs non-linear features at the present epoch, such as walls and filaments, corresponding to high-order correlation functions generated by late-time structure formation. Our inference framework self-consistently accounts for typical observational systematic and statistical uncertainties such as noise, survey geometry and selection effects. We further account for luminosity dependent galaxy biases and automatic noise calibration within a fully Bayesian approach. As a result, this analysis provides highly-detailed and accurate reconstructions of the present density field on scales larger than  3\sim~3 Mpc/h/h, constrained by SDSS observations. This approach also leads to the first quantitative inference of plausible formation histories of the dynamic large scale structure underlying the observed galaxy distribution. The results described in this work constitute the first full Bayesian non-linear analysis of the cosmic large scale structure with the demonstrated capability of uncertainty quantification. Some of these results will be made publicly available along with this work. The level of detail of inferred results and the high degree of control on observational uncertainties pave the path towards high precision chrono-cosmography, the subject of simultaneously studying the dynamics and the morphology of the inhomogeneous Universe.Comment: 27 pages, 9 figure
    corecore