7,417,131 research outputs found
FishMark: A Linked Data Application Benchmark
Abstract. FishBase is an important species data collection produced by the FishBase Information and Research Group Inc (FIN), a not-forprofit NGO with the aim of collecting comprehensive information (from the taxonomic to the ecological) about all the world’s finned fish species. FishBase is exposed as a MySQL backed website (supporting a range of canned, although complex queries) and serves over 33 million hits per month. FishDelish is a transformation of FishBase into LinkedData weighing in at 1.38 billion triples. We have ported a substantial number of FishBase SQL queries to FishDelish SPARQL query which form the basis of a new linked data application benchmark (using our derivative of the Berlin SPARQL Benchmark harness). We use this benchmarking framework to compare the performance of the native MySQL application, the Virtuoso RDF triple store, and the Quest OBDA system on a fishbase.org like application.
Representing complex data using localized principal components with application to astronomical data
Often the relation between the variables constituting a multivariate data
space might be characterized by one or more of the terms: ``nonlinear'',
``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or,
more general, ``complex''. In these cases, simple principal component analysis
(PCA) as a tool for dimension reduction can fail badly. Of the many alternative
approaches proposed so far, local approximations of PCA are among the most
promising. This paper will give a short review of localized versions of PCA,
focusing on local principal curves and local partitioning algorithms.
Furthermore we discuss projections other than the local principal components.
When performing local dimension reduction for regression or classification
problems it is important to focus not only on the manifold structure of the
covariates, but also on the response variable(s). Local principal components
only achieve the former, whereas localized regression approaches concentrate on
the latter. Local projection directions derived from the partial least squares
(PLS) algorithm offer an interesting trade-off between these two objectives. We
apply these methods to several real data sets. In particular, we consider
simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and
Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds),
Lecture Notes in Computational Science and Engineering, Springer, 2007, pp.
180--204,
http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-
Ultrametric embedding: application to data fingerprinting and to fast data clustering
We begin with pervasive ultrametricity due to high dimensionality and/or
spatial sparsity. How extent or degree of ultrametricity can be quantified
leads us to the discussion of varied practical cases when ultrametricity can be
partially or locally present in data. We show how the ultrametricity can be
assessed in text or document collections, and in time series signals. An aspect
of importance here is that to draw benefit from this perspective the data may
need to be recoded. Such data recoding can also be powerful in proximity
searching, as we will show, where the data is embedded globally and not locally
in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19
May 2006 versio
CosmoDM and its application to Pan-STARRS data
The Cosmology Data Management system (CosmoDM) is an automated and flexible
data management system for the processing and calibration of data from optical
photometric surveys. It is designed to run on supercomputers and to minimize
disk I/O to enable scaling to very high throughput during periods of
reprocessing. It serves as an early prototype for one element of the
ground-based processing required by the Euclid mission and will also be
employed in the preparation of ground based data needed in the eROSITA X-ray
all sky survey mission. CosmoDM consists of two main pipelines. The first is
the single-epoch or detrending pipeline, which is used to carry out the
photometric and astrometric calibration of raw exposures. The second is the co-
addition pipeline, which combines the data from individual exposures into
deeper coadd images and science ready catalogs. A novel feature of CosmoDM is
that it uses a modified stack of As- tromatic software which can read and write
tile compressed images. Since 2011, CosmoDM has been used to process data from
the DECam, the CFHT MegaCam and the Pan-STARRS cameras. In this paper we shall
describe how processed Pan-STARRS data from CosmoDM has been used to optically
confirm and measure photometric redshifts of Planck-based Sunyaev-Zeldovich
effect selected cluster candidates.Comment: 11 pages, 4 figures. Proceedings of Precision Astronomy with Fully
Depleted CCDs Workshop (2014). Accepted for publication in JINS
DAMEWARE - Data Mining & Exploration Web Application Resource
Astronomy is undergoing through a methodological revolution triggered by an
unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining &
Exploration Web Application and REsource) is a general purpose, Web-based,
Virtual Observatory compliant, distributed data mining framework specialized in
massive data sets exploration with machine learning methods. We present the
DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the
scientific community to perform data mining and exploratory experiments on
massive data sets, by using a simple web browser. DAMEWARE offers several tools
which can be seen as working environments where to choose data analysis
functionalities such as clustering, classification, regression, feature
extraction etc., together with models and algorithms.Comment: User Manual of the DAMEWARE Web Application, 51 page
- …
