1,476 research outputs found

    New Image Statistics for Detecting Disturbed Galaxy Morphologies at High Redshift

    Get PDF
    Testing theories of hierarchical structure formation requires estimating the distribution of galaxy morphologies and its change with redshift. One aspect of this investigation involves identifying galaxies with disturbed morphologies (e.g., merging galaxies). This is often done by summarizing galaxy images using, e.g., the CAS and Gini-M20 statistics of Conselice (2003) and Lotz et al. (2004), respectively, and associating particular statistic values with disturbance. We introduce three statistics that enhance detection of disturbed morphologies at high-redshift (z ~ 2): the multi-mode (M), intensity (I), and deviation (D) statistics. We show their effectiveness by training a machine-learning classifier, random forest, using 1,639 galaxies observed in the H band by the Hubble Space Telescope WFC3, galaxies that had been previously classified by eye by the CANDELS collaboration (Grogin et al. 2011, Koekemoer et al. 2011). We find that the MID statistics (and the A statistic of Conselice 2003) are the most useful for identifying disturbed morphologies. We also explore whether human annotators are useful for identifying disturbed morphologies. We demonstrate that they show limited ability to detect disturbance at high redshift, and that increasing their number beyond approximately 10 does not provably yield better classification performance. We propose a simulation-based model-fitting algorithm that mitigates these issues by bypassing annotation.Comment: 15 pages, 14 figures, accepted for publication in MNRA

    A Search for sub-km KBOs with the Method of Serendipitous Stellar Occultations

    Full text link
    The results of a search for sub-km Kuiper Belt Objects (KBOs) with the method of serendipitous stellar occultations are reported. Photometric time series were obtained on the 1.8m telescope at the Dominion Astrophysical Observatory (DAO) in Victoria, BC, and were analyzed for the presence of occultation events. Observations were performed at 40 Hz and included a total of 5.0 star-hours for target stars in the ecliptic open cluster M35 (beta=0.9deg), and 2.1 star-hours for control stars in the off-ecliptic open cluster M34 (beta=25.7deg). To evaluate the recovery fraction of the analysis method, and thereby determine the limiting detectable size, artificial occultation events were added to simulated time series (1/f scintillation-like power-spectra), and to the real data. No viable candidate occultation events were detected. This limits the cumulative surface density of KBOs to 3.5e10 deg^{-2} (95% confidence) for KBOs brighter than m_R=35.3 (larger than ~860m in diameter, assuming a geometric albedo of 0.04 and a distance of 40 AU). An evaluation of TNO occultations reported in the literature suggests that they are unlikely to be genuine, and an overall 95%-confidence upper limit on the surface density of 2.8e9 deg^{-2} is obtained for KBOs brighter than m_R=35 (larger than ~1 km in diameter, assuming a geometric albedo of 0.04 and a distance of 40 AU) when all existing surveys are combined.Comment: Accepted for publication in A

    Spatial and spatio-temporal point patterns on linear networks

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe last decade witnessed an extraordinary increase in interest in the analysis of network related data and trajectories. This pervasive interest is partly caused by a strongly expanded availability of such datasets. In the spatial statistics field, there are numerous real examples such as the locations of traffic accidents and geo-coded locations of crimes in the streets of cities that need to restrict the support of the underlying process over such linear networks to set and define a more realistic scenario. Examples of trajectories are the path taken by moving objects such as taxis, human beings, animals, etc. Intensity estimation on a network of lines, such as a road network, seems to be a surprisingly complicated task. Several techniques published in the literature, in geography and computer science, have turned out to be erroneous. We propose several adaptive and non-adaptive intensity estimators, based on kernel smoothing and Voronoi tessellation. Theoretical properties such as bias, variance, asymptotics, bandwidth selection, variance estimation, relative risk estimation, and adaptive smoothing are discussed. Moreover, their statistical performance is studied through simulation studies and is compared with existing methods. Adding the temporal component, we also consider spatio-temporal point patterns with spatial locations restricted to a linear network. We present a nonparametric kernel-based intensity estimator and develop second-order characteristics of spatio-temporal point processes on linear networks such as K-function and pair correlation function to analyse the type of interaction between points. In terms of trajectories, we introduce the R package trajectories that contains different classes and methods to handle, summarise and analyse trajectory data. Simulation and model fitting, intensity estimation, distance analysis, movement smoothing, Chi maps and second-order summary statistics are discussed. Moreover, we analyse different real datasets such as a crime data from Chicago (US), anti-social behaviour in Castell´on (Spain), traffic accidents in Medell´ın (Colombia), traffic accidents in Western Australia, motor vehicle traffic accidents in an area of Houston (US), locations of pine saplings in a Finnish forest, traffic accidents in Eastbourne (UK) and one week taxi movements in Beijing (China)

    Pattern recognition and machine learning for magnetic resonance images with kernel methods

    Get PDF
    The aim of this thesis is to apply a particular category of machine learning and pattern recognition algorithms, namely the kernel methods, to both functional and anatomical magnetic resonance images (MRI). This work specifically focused on supervised learning methods. Both methodological and practical aspects are described in this thesis. Kernel methods have the computational advantage for high dimensional data, therefore they are idea for imaging data. The procedures can be broadly divided into two components: the construction of the kernels and the actual kernel algorithms themselves. Pre-processed functional or anatomical images can be computed into a linear kernel or a non-linear kernel. We introduce both kernel regression and kernel classification algorithms in two main categories: probabilistic methods and non-probabilistic methods. For practical applications, kernel classification methods were applied to decode the cognitive or sensory states of the subject from the fMRI signal and were also applied to discriminate patients with neurological diseases from normal people using anatomical MRI. Kernel regression methods were used to predict the regressors in the design of fMRI experiments, and clinical ratings from the anatomical scans

    Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA

    Get PDF
    Multi-voxel pattern analysis (MVPA) is a fruitful and increasingly popular complement to traditional univariate methods of analyzing neuroimaging data. We propose to replace the standard ‘decoding’ approach to searchlight-based MVPA, measuring the performance of a classifier by its accuracy, with a method based on the multivariate form of the general linear model. Following the well-established methodology of multivariate analysis of variance (MANOVA), we define a measure that directly characterizes the structure of multi-voxel data, the pattern distinctness D. Our measure is related to standard multivariate statistics, but we apply cross-validation to obtain an unbiased estimate of its population value, independent of the amount of data or its partitioning into ‘training’ and ‘test’ sets. The estimate can therefore serve not only as a test statistic, but also as an interpretable measure of multivariate effect size. The pattern distinctness generalizes the Mahalanobis distance to an arbitrary number of classes, but also the case where there are no classes of trials because the design is described by parametric regressors. It is defined for arbitrary estimable contrasts, including main effects (pattern differences) and interactions (pattern changes). In this way, our approach makes the full analytical power of complex factorial designs known from univariate fMRI analyses available to MVPA studies. Moreover, we show how the results of a factorial analysis can be used to obtain a measure of pattern stability, the equivalent of ‘cross-decoding’

    Robust and Semiparametric Statistical Modeling for Cancer Research.

    Full text link
    In the application of biostatistical methodology to cancer studies, there is a desire to use methods with fewer or less restrictive assumptions, which often lead to more easily generalizable conclusions. The first chapter deals with robust modeling of binary responses with the goal of improving classification at an arbitrary probability threshold dictated by the particular application. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. This work has much in common with robust estimation, but differs from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. The second chapter addresses the difficulties inherent in investigating time to cancer onset when only time to diagnosis can be observed. To address this problem, we propose a joint model for the unobserved time to the latent and terminal events, with the two events linked by the baseline hazard. We propose an EM algorithm for estimation of the baseline hazard, which allows for closed-form Breslow-type estimators at each iteration, reducing computational time compared with maximizing the marginal likelihood directly. We demonstrate use of the method with analysis of a prostate cancer data set from SEER. In the third chapter, we apply methodology originally used in survival analysis to model semicontinuous data. Continuous outcome data with a proportion of observations equal to zero arises frequently in biomedical studies. We propose a semiparametric model based on a biological system with competing damage manifestation and resistance processes. This allows us to derive a partial likelihood based on the retro-hazard function, leading to a flexible procedure for modeling continuous data with a point mass at zero. We apply the method to a data set consisting of pulmonary capillary hemorrhage area in lab rats subjected to diagnostic ultrasound.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113285/1/jdrice_1.pd
    • …
    corecore