Search CORE

7,417,131 research outputs found

FishMark: A Linked Data Application Benchmark

Author: Alkiviadous S.
Bail S.
Concalves R. S.
Garilao Cristina
Parsia B.
van Harmelen M.
Workman D.
Publication venue: CEUR
Publication date: 01/01/2012
Field of study

Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the world’s finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.

OceanRep

CiteSeerX

The University of Manchester - Institutional Repository

Representing complex data using localized principal components with application to astronomical data

Author: A Gersho
A Gorban
AH Monaghan
AR Webb
B Chalmond
B Kégl
C Allende Prieto
CAL Bailer-Jones
CAL Bailer-Jones
DJ Marchette
E Diday
E Oja
EC Malthouse
EM Braverman
FL Hall
H Hotelling
H Späth
H Wold
IT Jolliffe
J Einbeck
J Einbeck
JH Friedman
JH Friedman
JH Friedman
JJ Verbeek
JM Chambers
K Fukunaga
K Hornik
L Breiman
MAC Perryman
MG Kendall
N Kambhatla
P Delicado
P Delicado
PG Willemsen
R Tibshirani
RJ Bolton
S de Jong
T Aluja-Banet
T Duchamps
T Hastie
T Hastie
WS Cleveland
Z-Y Liu
Publication venue
Publication date: 01/01/2007
Field of study

Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

arXiv.org e-Print Archive

Durham Research Online

Crossref

Enlighten

Explore Bristol Research

Ultrametric embedding: application to data fingerprinting and to fast data clustering

Author: Murtagh Fionn
Publication venue
Publication date: 28/01/2007
Field of study

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

arXiv.org e-Print Archive

CiteSeerX

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

De Montfort University Open Research Archive

CosmoDM and its application to Pan-STARRS data

Author: Desai S.
Henderson R.
Kummel M.
Mohr J. J.
Paech K.
Wetzstein M.
Publication venue: 'IOP Publishing'
Publication date: 01/06/2015
Field of study

The Cosmology Data Management system (CosmoDM) is an automated and flexible data management system for the processing and calibration of data from optical photometric surveys. It is designed to run on supercomputers and to minimize disk I/O to enable scaling to very high throughput during periods of reprocessing. It serves as an early prototype for one element of the ground-based processing required by the Euclid mission and will also be employed in the preparation of ground based data needed in the eROSITA X-ray all sky survey mission. CosmoDM consists of two main pipelines. The first is the single-epoch or detrending pipeline, which is used to carry out the photometric and astrometric calibration of raw exposures. The second is the co- addition pipeline, which combines the data from individual exposures into deeper coadd images and science ready catalogs. A novel feature of CosmoDM is that it uses a modified stack of As- tromatic software which can read and write tile compressed images. Since 2011, CosmoDM has been used to process data from the DECam, the CFHT MegaCam and the Pan-STARRS cameras. In this paper we shall describe how processed Pan-STARRS data from CosmoDM has been used to optically confirm and measure photometric redshifts of Planck-based Sunyaev-Zeldovich effect selected cluster candidates.Comment: 11 pages, 4 figures. Proceedings of Precision Astronomy with Fully Depleted CCDs Workshop (2014). Accepted for publication in JINS

arXiv.org e-Print Archive

Crossref

MPG.PuRe

DAMEWARE - Data Mining & Exploration Web Application Resource

Author: Brescia Massimo
Cavuoti Stefano
Esposito Francesco
Fiore Michelangelo
Garofalo Mauro
Guglielmo Marisa
Longo Giuseppe
Manna Francesco
Nocella Alfonso
Vellucci Civita
Publication venue
Publication date: 01/01/2016
Field of study

Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining & Exploration Web Application and REsource) is a general purpose, Web-based, Virtual Observatory compliant, distributed data mining framework specialized in massive data sets exploration with machine learning methods. We present the DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the scientific community to perform data mining and exploratory experiments on massive data sets, by using a simple web browser. DAMEWARE offers several tools which can be seen as working environments where to choose data analysis functionalities such as clustering, classification, regression, feature extraction etc., together with models and algorithms.Comment: User Manual of the DAMEWARE Web Application, 51 page

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II