7,417,131 research outputs found

    FishMark: A Linked Data Application Benchmark

    Get PDF
    Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the world’s finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.

    Representing complex data using localized principal components with application to astronomical data

    Full text link
    Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

    Ultrametric embedding: application to data fingerprinting and to fast data clustering

    Get PDF
    We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

    CosmoDM and its application to Pan-STARRS data

    Full text link
    The Cosmology Data Management system (CosmoDM) is an automated and flexible data management system for the processing and calibration of data from optical photometric surveys. It is designed to run on supercomputers and to minimize disk I/O to enable scaling to very high throughput during periods of reprocessing. It serves as an early prototype for one element of the ground-based processing required by the Euclid mission and will also be employed in the preparation of ground based data needed in the eROSITA X-ray all sky survey mission. CosmoDM consists of two main pipelines. The first is the single-epoch or detrending pipeline, which is used to carry out the photometric and astrometric calibration of raw exposures. The second is the co- addition pipeline, which combines the data from individual exposures into deeper coadd images and science ready catalogs. A novel feature of CosmoDM is that it uses a modified stack of As- tromatic software which can read and write tile compressed images. Since 2011, CosmoDM has been used to process data from the DECam, the CFHT MegaCam and the Pan-STARRS cameras. In this paper we shall describe how processed Pan-STARRS data from CosmoDM has been used to optically confirm and measure photometric redshifts of Planck-based Sunyaev-Zeldovich effect selected cluster candidates.Comment: 11 pages, 4 figures. Proceedings of Precision Astronomy with Fully Depleted CCDs Workshop (2014). Accepted for publication in JINS

    DAMEWARE - Data Mining & Exploration Web Application Resource

    Get PDF
    Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining & Exploration Web Application and REsource) is a general purpose, Web-based, Virtual Observatory compliant, distributed data mining framework specialized in massive data sets exploration with machine learning methods. We present the DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the scientific community to perform data mining and exploratory experiments on massive data sets, by using a simple web browser. DAMEWARE offers several tools which can be seen as working environments where to choose data analysis functionalities such as clustering, classification, regression, feature extraction etc., together with models and algorithms.Comment: User Manual of the DAMEWARE Web Application, 51 page
    corecore