2,375 research outputs found
Multidimensional Balance-Based Cluster Boundary Detection for High-Dimensional Data
© 2018 IEEE. The balance of neighborhood space around a central point is an important concept in cluster analysis. It can be used to effectively detect cluster boundary objects. The existing neighborhood analysis methods focus on the distribution of data, i.e., analyzing the characteristic of the neighborhood space from a single perspective, and could not obtain rich data characteristics. In this paper, we analyze the high-dimensional neighborhood space from multiple perspectives. By simulating each dimension of a data point's k nearest neighbors space (k NNs) as a lever, we apply the lever principle to compute the balance fulcrum of each dimension after proving its inevitability and uniqueness. Then, we model the distance between the projected coordinate of the data point and the balance fulcrum on each dimension and construct the DHBlan coefficient to measure the balance of the neighborhood space. Based on this theoretical model, we propose a simple yet effective cluster boundary detection algorithm called Lever. Experiments on both low- and high-dimensional data sets validate the effectiveness and efficiency of our proposed algorithm
Clustering of star-forming galaxies detected in mid-infrared with the Spitzer wide-area survey
We discuss the clustering properties of galaxies with signs of ongoing star
formation detected by the Spitzer Space Telescope at 24mum band in the SWIRE
Lockman Hole field. The sample of mid-IR-selected galaxies includes ~20,000
objects detected above a flux threshold of S24mum=310muJy. We adopt
optical/near-IR color selection criteria to split the sample into the
lower-redshift and higher-redshift galaxy populations. We measure the angular
correlation function on scales of theta=0.01-3.5 deg, from which, using the
Limber inversion along with the redshift distribution established for similarly
selected source populations in the GOODS fields (Rodighiero et al. 2010), we
obtain comoving correlation lengths of r0=4.98+-0.28 h^-1 Mpc and r0
=8.04+-0.69 h^-1 Mpc for the low-z (=0.7) and high-z (=1.7) subsamples,
respectively. Comparing these measurements with the correlation functions of
dark matter halos identified in the Bolshoi cosmological simulation (Klypin et
al. 2011}, we find that the high-redshift objects reside in progressively more
massive halos reaching Mtot>3e12 h^-1 Msun, compared to Mtot>7e11 h^-1 Msun for
the low-redshift population. Approximate estimates of the IR luminosities based
on the catalogs of 24mum sources in the GOODS fields show that our high-z
subsample represents a population of "distant ULIRGs" with LIR>10^12Lsun, while
the low-z subsample mainly consists of "LIRGs", LIR~10^11Lsun. The comparison
of number density of the 24mum selected galaxies and of dark matter halos with
derived minimum mass Mtot shows that only 20% of such halos may host
star-forming galaxies.Comment: 15 pages, 12 figure
Galaxy alignments: An overview
The alignments between galaxies, their underlying matter structures, and the
cosmic web constitute vital ingredients for a comprehensive understanding of
gravity, the nature of matter, and structure formation in the Universe. We
provide an overview on the state of the art in the study of these alignment
processes and their observational signatures, aimed at a non-specialist
audience. The development of the field over the past one hundred years is
briefly reviewed. We also discuss the impact of galaxy alignments on
measurements of weak gravitational lensing, and discuss avenues for making
theoretical and observational progress over the coming decade.Comment: 43 pages excl. references, 16 figures; minor changes to match version
published in Space Science Reviews; part of a topical volume on galaxy
alignments, with companion papers at arXiv:1504.05546 and arXiv:1504.0546
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
A divide-and-conquer approach to geometric sampling for active learning
Active learning (AL) repeatedly trains the classifier with the minimum
labeling budget to improve the current classification model. The training
process is usually supervised by an uncertainty evaluation strategy. However,
the uncertainty evaluation always suffers from performance degeneration when
the initial labeled set has insufficient labels. To completely eliminate the
dependence on the uncertainty evaluation sampling in AL, this paper proposes a
divide-and-conquer idea that directly transfers the AL sampling as the
geometric sampling over the clusters. By dividing the points of the clusters
into cluster boundary and core points, we theoretically discuss their margin
distance and {hypothesis relationship}. With the advantages of cluster boundary
points in the above two properties, we propose a Geometric Active Learning
(GAL) algorithm by knight's tour. Experimental studies of the two reported
experimental tasks including cluster boundary detection and AL classification
show that the proposed GAL method significantly outperforms the
state-of-the-art baselines.Comment: This paper has been withdrawn. The first author quitted the PhD study
from AAI, University of Technology Sydney. The manuscript stopped updatin
Clustering of the AKARI NEP deep field 24<i>μ</i>m selected galaxies
Aims. We present a method of selection of 24 μm galaxies from the AKARI north ecliptic pole (NEP) deep field down to 150 μJy and measurements of their two-point correlation function. We aim to associate various 24 μm selected galaxy populations with present day galaxies and to investigate the impact of their environment on the direction of their subsequent evolution.
Methods. We discuss using of Support Vector Machines (SVM) algorithm applied to infrared photometric data to perform star-galaxy separation, in which we achieve an accuracy higher than 80%. The photometric redshift information, obtained through the CIGALE code, is used to explore the redshift dependence of the correlation function parameter (r0) as well as the linear bias evolution. This parameter relates galaxy distribution to the one of the underlying dark matter. We connect the investigated sources to their potential local descendants through a simplified model of the clustering evolution without interactions.
Results. We observe two different populations of star-forming galaxies, at zmed ∼ 0.25, zmed ∼ 0.9. Measurements of total infrared luminosities (LTIR) show that the sample at zmed ∼ 0.25 is composed mostly of local star-forming galaxies, while the sample at zmed ∼ 0.9 is composed of luminous infrared galaxies (LIRGs) with LTIR ∼ 1011.62 L⨀. We find that dark halo mass is not necessarily correlated with the LTIR: for subsamples with LTIR = 1011.15 L⨀ at zmed ∼ 0.7 we observe a higher clustering length (r0 = 6.21 ± 0.78 [h−1Mpc]) than for a subsample with mean LTIR = 1011.84 L⨀ at zmed ∼ 1.1 (r0 = 5.86 ± 0.69 h−1Mpc). We find that galaxies at zmed ∼ 0.9 can be ancestors of present day L∗ early type galaxies, which exhibit a very high r0 ∼ 8h−1 Mpc.</p
Integrated Multiparametric Radiomics and Informatics System for Characterizing Breast Tumor Characteristics with the OncotypeDX Gene Assay
Optimal use of multiparametric magnetic resonance imaging (mpMRI) can identify key MRI parameters and provide unique tissue signatures defining phenotypes of breast cancer. We have developed and implemented a new machine-learning informatic system, termed Informatics Radiomics Integration System (IRIS) that integrates clinical variables, derived from imaging and electronic medical health records (EHR) with multiparametric radiomics (mpRad) for identifying potential risk of local or systemic recurrence in breast cancer patients. We tested the model in patients (n = 80) who had Estrogen Receptor positive disease and underwent OncotypeDX gene testing, radiomic analysis, and breast mpMRI. The IRIS method was trained using the mpMRI, clinical, pathologic, and radiomic descriptors for prediction of the OncotypeDX risk score. The trained mpRad IRIS model had a 95% and specificity was 83% with an Area Under the Curve (AUC) of 0.89 for classifying low risk patients from the intermediate and high-risk groups. The lesion size was larger for the high-risk group (2.9 ± 1.7 mm) and lower for both low risk (1.9 ± 1.3 mm) and intermediate risk (1.7 ± 1.4 mm) groups. The lesion apparent diffusion coefficient (ADC) map values for high- and intermediate-risk groups were significantly (p \u3c 0.05) lower than the low-risk group (1.14 vs. 1.49 × 10−3 mm2/s). These initial studies provide deeper insight into the clinical, pathological, quantitative imaging, and radiomic features, and provide the foundation to relate these features to the assessment of treatment response for improved personalized medicine
Past and present cosmic structure in the SDSS DR7 main sample
We present a chrono-cosmography project, aiming at the inference of the four
dimensional formation history of the observed large scale structure from its
origin to the present epoch. To do so, we perform a full-scale Bayesian
analysis of the northern galactic cap of the Sloan Digital Sky Survey (SDSS)
Data Release 7 main galaxy sample, relying on a fully probabilistic, physical
model of the non-linearly evolved density field. Besides inferring initial
conditions from observations, our methodology naturally and accurately
reconstructs non-linear features at the present epoch, such as walls and
filaments, corresponding to high-order correlation functions generated by
late-time structure formation. Our inference framework self-consistently
accounts for typical observational systematic and statistical uncertainties
such as noise, survey geometry and selection effects. We further account for
luminosity dependent galaxy biases and automatic noise calibration within a
fully Bayesian approach. As a result, this analysis provides highly-detailed
and accurate reconstructions of the present density field on scales larger than
Mpc, constrained by SDSS observations. This approach also leads to
the first quantitative inference of plausible formation histories of the
dynamic large scale structure underlying the observed galaxy distribution. The
results described in this work constitute the first full Bayesian non-linear
analysis of the cosmic large scale structure with the demonstrated capability
of uncertainty quantification. Some of these results will be made publicly
available along with this work. The level of detail of inferred results and the
high degree of control on observational uncertainties pave the path towards
high precision chrono-cosmography, the subject of simultaneously studying the
dynamics and the morphology of the inhomogeneous Universe.Comment: 27 pages, 9 figure
- …