10,437 research outputs found

    The Infrared Database of Extragalactic Observables from Spitzer I: the redshift catalog

    Full text link
    This is the first of a series of papers on the Infrared Database of Extragalactic Observables from Spitzer (IDEOS). In this work we describe the identification of optical counterparts of the infrared sources detected in Spitzer Infrared Spectrograph (IRS) observations, and the acquisition and validation of redshifts. The IDEOS sample includes all the spectra from the Cornell Atlas of Spitzer/IRS Sources (CASSIS) of galaxies beyond the Local Group. Optical counterparts were identified from correlation of the extraction coordinates with the NASA Extragalactic Database (NED). To confirm the optical association and validate NED redshifts, we measure redshifts with unprecedented accuracy on the IRS spectra ({\sigma}(dz/(1+z))=0.0011) by using an improved version of the maximum combined pseudo-likelihood method (MCPL). We perform a multi-stage verification of redshifts that considers alternate NED redshifts, the MCPL redshift, and visual inspection of the IRS spectrum. The statistics is as follows: the IDEOS sample contains 3361 galaxies at redshift 0<z<6.42 (mean: 0.48, median: 0.14). We confirm the default NED redshift for 2429 sources and identify 124 with incorrect NED redshifts. We obtain IRS-based redshifts for 568 IDEOS sources without optical spectroscopic redshifts, including 228 with no previous redshift measurements. We provide the entire IDEOS redshift catalog in machine-readable formats. The catalog condenses our compilation and verification effort, and includes our final evaluation on the most likely redshift for each source, its origin, and reliability estimates.Comment: 11 pages, 6 figures, 1 table. Accepted for publication in MNRAS. Full redshift table in machine-readable format available at http://ideos.astro.cornell.edu/redshifts.htm

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    A Local Hubble Bubble from SNe Ia?

    Full text link
    We analyze the monopole in the peculiar velocities of 44 Type Ia supernovae (SNe Ia) to test for a local void. The sample extends from 20 to 300 Mpc/h, with distances, deduced from light-curve shapes, accurate to ~6%. Assuming Omega_m=1 and Omega_lambda=0, the most significant deviation we find from the Hubble law is an outwards flow of (6.6+/-2.2)% inside a sphere of radius 70 Mpc/h as would be produced by a void of ~20% underdensity surrounded by a dense shell. This shell roughly coincides with the local Great Walls. Monte Carlo analyses, using Gaussian errors or bootstrap resampling, show the probability for chance occurrence of this result out of a pure Hubble flow to be ~2%. The monopole could be contaminated by higher moments of the velocity field, especially a quadrupole, which are not properly probed by the current limited sky coverage. The void would be less significant if Omega_m is low and Omega_lambda is high. It would be more significant if one outlier is removed from the sample, or if the size of the void is constrained a-priori. This putative void is not in significant conflict with any of the standard cosmological scenarios. It suggests that the Hubble constant as determined within 70 Mpc/h could be overestimated by ~6% and the local value of Omega may be underestimated by ~20%. While the present evidence for a local void is marginal in this data set, the analysis shows that the accumulation of SNe Ia distances will soon provide useful constraints on elusive and important aspects of regional cosmic dynamics.Comment: 21 pages, 3 figures. Slightly revised version. To appear in ApJ, 503, Aug. 20, 199

    Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission

    Full text link
    By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2) consideration of imbalanced classification in the experiments. Submitted to IEEE Journal for possible publicatio

    Outlier detection using distributionally robust optimization under the Wasserstein metric

    Full text link
    We present a Distributionally Robust Optimization (DRO) approach to outlier detection in a linear regression setting, where the closeness of probability distributions is measured using the Wasserstein metric. Training samples contaminated with outliers skew the regression plane computed by least squares and thus impede outlier detection. Classical approaches, such as robust regression, remedy this problem by downweighting the contribution of atypical data points. In contrast, our Wasserstein DRO approach hedges against a family of distributions that are close to the empirical distribution. We show that the resulting formulation encompasses a class of models, which include the regularized Least Absolute Deviation (LAD) as a special case. We provide new insights into the regularization term and give guidance on the selection of the regularization coefficient from the standpoint of a confidence region. We establish two types of performance guarantees for the solution to our formulation under mild conditions. One is related to its out-of-sample behavior, and the other concerns the discrepancy between the estimated and true regression planes. Extensive numerical results demonstrate the superiority of our approach to both robust regression and the regularized LAD in terms of estimation accuracy and outlier detection rates

    Probabilistic Anomaly Detection in Natural Gas Time Series Data

    Get PDF
    This paper introduces a probabilistic approach to anomaly detection, specifically in natural gas time series data. In the natural gas field, there are various types of anomalies, each of which is induced by a range of causes and sources. The causes of a set of anomalies are examined and categorized, and a Bayesian maximum likelihood classifier learns the temporal structures of known anomalies. Given previously unseen time series data, the system detects anomalies using a linear regression model with weather inputs, after which the anomalies are tested for false positives and classified using a Bayesian classifier. The method can also identify anomalies of an unknown origin. Thus, the likelihood of a data point being anomalous is given for anomalies of both known and unknown origins. This probabilistic anomaly detection method is tested on a reported natural gas consumption data set
    • …
    corecore