172,066 research outputs found

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    Nondeterministic data base for computerized visual perception

    Get PDF
    A description is given of the knowledge representation data base in the perception subsystem of the Mars robot vehicle prototype. Two types of information are stored. The first is generic information that represents general rules that are conformed to by structures in the expected environments. The second kind of information is a specific description of a structure, i.e., the properties and relations of objects in the specific case being analyzed. The generic knowledge is represented so that it can be applied to extract and infer the description of specific structures. The generic model of the rules is substantially a Bayesian representation of the statistics of the environment, which means it is geared to representation of nondeterministic rules relating properties of, and relations between, objects. The description of a specific structure is also nondeterministic in the sense that all properties and relations may take a range of values with an associated probability distribution

    Structural and Photometric Classification of Galaxies - I. Calibration Based on a Nearby Galaxy Sample

    Full text link
    In this paper we define an observationally robust, multi-parameter space for the classification of nearby and distant galaxies. The parameters include luminosity, color, and the image-structure parameters: size, image concentration, asymmetry, and surface brightness. Based on an initial calibration of this parameter space using the ``normal'' Hubble-types surveyed by Frei et al. (1996), we find that only a subset of the parameters provide useful classification boundaries for this sample. Interestingly, this subset does not include distance-dependent scale parameters, such as size or luminosity. The essential ingredient is the combination of a spectral index (e.g., color) with parameters of image structure and scale: concentration, asymmetry, and surface-brightness. We refer to the image structure parameters (concentration and asymmetry) as indices of ``form.'' We define a preliminary classification based on spectral index, form, and surface-brightness (a scale) that successfully separates normal galaxies into three classes. We intentionally identify these classes with the familiar labels of Early, Intermediate, and Late. This classification, or others based on the above four parameters can be used reliably to define comparable samples over a broad range in redshift. The size and luminosity distribution of such samples will not be biased by this selection process except through astrophysical correlations between spectral index, form, and surface-brightness.Comment: to appear in AJ (June, 2000); 34 pages including 4 tables and 12 figure

    A systematic comparison of supervised classifiers

    Get PDF
    Pattern recognition techniques have been employed in a myriad of industrial, medical, commercial and academic applications. To tackle such a diversity of data, many techniques have been devised. However, despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, the consideration of as many as possible techniques presents itself as an fundamental practice in applications aiming at high accuracy. Typical works comparing methods either emphasize the performance of a given algorithm in validation tests or systematically compare various algorithms, assuming that the practical use of these methods is done by experts. In many occasions, however, researchers have to deal with their practical classification tasks without an in-depth knowledge about the underlying mechanisms behind parameters. Actually, the adequate choice of classifiers and parameters alike in such practical circumstances constitutes a long-standing problem and is the subject of the current paper. We carried out a study on the performance of nine well-known classifiers implemented by the Weka framework and compared the dependence of the accuracy with their configuration parameter configurations. The analysis of performance with default parameters revealed that the k-nearest neighbors method exceeds by a large margin the other methods when high dimensional datasets are considered. When other configuration of parameters were allowed, we found that it is possible to improve the quality of SVM in more than 20% even if parameters are set randomly. Taken together, the investigation conducted in this paper suggests that, apart from the SVM implementation, Weka's default configuration of parameters provides an performance close the one achieved with the optimal configuration
    corecore