Search CORE

10,582 research outputs found

Structure in the 3D Galaxy Distribution: I. Methods and Example Results

Author: Abazajian
Abazajian
Adelman-McCarthy
Andersen
Barrow
Blanton
Blanton
Blanton
Bok
Cappellari
Choi
Connolly
Cowan
Croft
Daley
Daley
de Berg
de Vaucouleurs
de Vaucouleurs
DeSieno
Doroshkevich
Efstathiou
Einasto
Gazis
Gomez
Gott
Gray
Gray
Hogg
Holmberg
Hubble
Hubble
Icke
Ikeuchi
Ivezić
Ivezić
Jackson
Jeffrey D. Scargle
Kim
Kohonen
Krzewina
Kutoyants
M. J. Way
Martinez
Melnyk
Merényi
Messier
Moore
Neyman
Neyman
Okabe
P. R. Gazis
Papoulis
Paredes
Pearson
Peebles
Preparata
Ramella
Reiz
Ritter
Saslaw
Scargle
Scargle
Schaap
Schaap
Schlegel
Shandarin
Shane
Silverman
Slezak
Snyder
Soares-Santos
Sousbie
Sousbie
Stein
Stoyan
Strauss
Szapudi
Totsuji
Ueda
van de Weygaert
van de Weygaert
van de Weygaert
Wright
York
Zehavi
Zehavi
Zel'dovich
Zhang
Zwicky
Publication venue: 'IOP Publishing'
Publication date: 02/12/2010
Field of study

Three methods for detecting and characterizing structure in point data, such as that generated by redshift surveys, are described: classification using self-organizing maps, segmentation using Bayesian blocks, and density estimation using adaptive kernels. The first two methods are new, and allow detection and characterization of structures of arbitrary shape and at a wide range of spatial scales. These methods should elucidate not only clusters, but also the more distributed, wide-ranging filaments and sheets, and further allow the possibility of detecting and characterizing an even broader class of shapes. The methods are demonstrated and compared in application to three data sets: a carefully selected volume-limited sample from the Sloan Digital Sky Survey redshift data, a similarly selected sample from the Millennium Simulation, and a set of points independently drawn from a uniform probability distribution -- a so-called Poisson distribution. We demonstrate a few of the many ways in which these methods elucidate large scale structure in the distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd

arXiv.org e-Print Archive

Crossref

The Loss Rank Principle for Model Selection

Author: A. Reusken
D.J.C. MacKay
G. Schwarz
J.J. Rissanen
T. Hastie
Z. Bai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f_D^c be the (best) regressor of complexity c on data D. A more flexible regressor can fit more data D' well than a more rigid one. If something (here small loss) is easy to achieve it's typically worth less. We define the loss rank of f_D^c as the number of other (fictitious) data D' that are fitted better by f_D'^c than D is fitted by f_D^c. We suggest selecting the model complexity c that has minimal loss rank (LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression function and loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP, study it for specific regression problems, in particular linear ones, and compare it to other model selection schemes.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Supervised cross-modal factor analysis for multiple modal data classification

Author: Bensmail Halima
Duan Kanghong
Wang Jim Jing-Yan
Wang Jingbin
Zhou Yihua
Publication venue
Publication date: 18/08/2015
Field of study

In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods

arXiv.org e-Print Archive

CiteSeerX

Crossref