199,543 research outputs found

    A Deep Embedding Model for Co-occurrence Learning

    Full text link
    Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items. In this paper, we study co-occurrence data using a general energy-based probabilistic model, and we analyze three different categories of energy-based model, namely, the L1L_1, L2L_2 and LkL_k models, which are able to capture different levels of dependency in the co-occurrence data. We also discuss how several typical existing models are related to these three types of energy models, including the Fully Visible Boltzmann Machine (FVBM) (L2L_2), Matrix Factorization (L2L_2), Log-BiLinear (LBL) models (L2L_2), and the Restricted Boltzmann Machine (RBM) model (LkL_k). Then, we propose a Deep Embedding Model (DEM) (an LkL_k model) from the energy model in a \emph{principled} manner. Furthermore, motivated by the observation that the partition function in the energy model is intractable and the fact that the major objective of modeling the co-occurrence data is to predict using the conditional probability, we apply the \emph{maximum pseudo-likelihood} method to learn DEM. In consequence, the developed model and its learning method naturally avoid the above difficulties and can be easily used to compute the conditional probability in prediction. Interestingly, our method is equivalent to learning a special structured deep neural network using back-propagation and a special sampling strategy, which makes it scalable on large-scale datasets. Finally, in the experiments, we show that the DEM can achieve comparable or better results than state-of-the-art methods on datasets across several application domains

    Adaptive Resonance Theory (ART) for social media analytics

    Get PDF
    This chapter presents the ART-based clustering algorithms for social media analytics in detail. Sections 3.1 and 3.2 introduce Fuzzy ART and its clustering mechanisms, respectively, which provides a deep understanding of the base model that is used and extended for handling the social media clustering challenges. Important concepts such as vigilance region (VR) and its properties are explained and proven. Subsequently, Sects. 3.3-3.7 illustrate five types of ART adaptive resonance theory variants, each of which addresses the challenges in one social media analytical scenario, including automated parameter adaptation, user preference incorporation, short text clustering, heterogeneous data co-clustering and online streaming data indexing. The content of this chapter is several prior studies, including Probabilistic ART [15

    Clustering of Very Red Galaxies in the Las Campanas IR Survey

    Full text link
    We report results from the first 1000 square arc-minutes of the Las Campanas IR survey. We have imaged 1 square degree of high latitude sky in six distinct fields to a 5-sigma H-band depth of 20.5 (Vega). Optical imaging in the V,R,I,and z' bands allow us to select color subsets and photometric-redshift-defined shells. We show that the angular clustering of faint red galaxies (18 3) is an order of magnitude stronger than that of the complete H-selected field sample. We employ three approaches to estimate n(z)n(z) in order to invert w(theta) to derive r_0. We find that our n(z) is well described by a Gaussian with = 1.2, sigma(z) = 0.15. From this we derive a value for r_0 of 7 (+2,-1) co-moving H^{-1} Mpc at = 1.2. This is a factor of ~ 2 larger than the clustering length for Lyman break galaxies and is similar to the expectation for early type galaxies at this epoch.Comment: 5 pages, 2 figures, 1 table. To appear in proceedings of the ESO/ECF/STScI workshop "Deep Fields" held in Garching, Germany, 9-12 October 200

    Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

    Get PDF
    Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Object classification is carried out via inference on the learnt conditional tree model, and we obtain significant gain in precision-recall and F-measures on MS-COCO, especially for difficult object categories. Moreover, the latent variables in the CLTM capture scene information: the images with top activations for a latent node have common themes such as being a grasslands or a food scene, and on on. In addition, we show that a simple k-means clustering of the inferred latent nodes alone significantly improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining, and without using scene labels during training. Thus, we present a unified framework for multi-object classification and unsupervised scene understanding

    Deep observations of CO line emission from star-forming galaxies in a cluster candidate at z=1.5

    Get PDF
    We report results from a deep Jansky Very Large Array (JVLA) search for CO 1-0 line emission from galaxies in a candidate galaxy cluster at z~1.55 in the COSMOS field. We target 4 galaxies with optical spectroscopic redshifts in the range z=1.47-1.59. Two of these 4 galaxies, ID51613 and ID51813, are nominally detected in CO line emission at the 3-4 sigma level. We find CO luminosities of 2.4x10^10 K km/s pc^2 and 1.3x10^10 K km/s pc^2, respectively. Taking advantage from the clustering and 2-GHz bandwidth of the JVLA, we perform a search for emission lines in the proximity of optical sources within the field of view of our observations. We limit our search to galaxies with K<23.5 (AB) and z_phot=1.2-1.8. We find 2 bright optical galaxies to be associated with significant emission line peaks (>4 sigma) in the data cube, which we identify with the CO line emission. To test the reliability of the line peaks found, we performed a parallel search for line peaks using a Bayesian inference method. Monte Carlo simulations show that such associations are statistically significant, with probabilities of chance association of 3.5% and 10.7% for ID 51207 and ID 51380, respectively. Modeling of their optical/IR SEDs indicates that the CO detected galaxies and candidates have stellar masses and SFRs in the range (0.3-1.1)x10^11 M_sun and 60-160 M_sun/yr, with SFEs comparable to that found in other star-forming galaxies at similar redshifts. By comparing the space density of CO emitters derived from our observations with the space density derived from previous CO detections at z~1.5, and with semi-analytic predictions for the CO luminosity function, we suggest that the latter tend to underestimate the number of CO galaxies detected at high-redshift. Finally, we argue about the benefits of future blind CO searches in clustered fields with upcoming submm/radio facilities.Comment: Accepted for publication in MNRAS. Abstract has been slightly shortened compared to original pdf versio

    The VIRMOS deep imaging survey II: CFH12K BVRI optical data for the 0226-04 deep field

    Full text link
    (abridged) In this paper we describe in detail the reduction, preparation and reliability of the photometric catalogues which comprise the 1.2 deg^2 CFH12K-VIRMOS deep field. The survey reaches a limiting magnitude of BAB~26.5, VAB~26.2, RAB~25.9 IAB~25.0 and contains 90,729 extended sources in the magnitude range 18.0<IAB<24.0. We demonstrate our catalogues are free from systematic biases and are complete and reliable down these limits. We estimate that the upper limit on bin-to-bin systematic photometric errors for the I- limited sample is ~10% in this magnitude range. We estimate that 68% of the catalogues sources have absolute per co-ordinate astrometric uncertainties less than ~0.38" and ~0.32" (alpha,delta). Our internal (filter-to-filter) per co-ordinate astrometric uncertainties are 0.08" and 0.08" (alpha,delta). We quantify the completeness of our survey in the joint space defined by object total magnitude and peak surface brightness. Finally, we present numerous comparisons between our catalogues and published literature data: galaxy and star counts, galaxy and stellar colours, and the clustering of both point-like and extended populations. In all cases our measurements are in excellent agreement with literature data to IAB<24.0. This combination of depth and areal coverage makes this multi-colour catalogue a solid foundation to select galaxies for follow-up spectroscopy with VIMOS on the ESO-VLT and a unique database to study the formation and evolution of the faint galaxy population to z~1 and beyond.Comment: 18 pages, 23 figures, accepted for publication in A&
    corecore