Search CORE

212 research outputs found

A Framework for XML-based Integration of Data, Visualization and Analysis in a Biomedical Domain

Author: Bales Nathan
Brinkley James F
Lee E. Sally
Mathur Shobhit
Re Chris
Suciu Dan
Publication venue
Publication date: 01/01/2005
Field of study

Biomedical data are becoming increasingly complex and heterogeneous in nature. The data are stored in distributed information systems, using a variety of data models, and are processed by increasingly more complex tools that analyze and visualize them. We present in this paper our framework for integrating biomedical research data and tools into a unique Web front end. Our framework is applied to the University of Washington’s Human Brain Project. Speciﬁcally, we present solutions to four integration tasks: deﬁnition of complex mappings from relational sources to XML, distributed XQuery processing, generation of heterogeneous output formats, and the integration of heterogeneous data visualization and analysis tools

University of Washington Structural Informatics Group Publications

Identification of Design Principles

Author: Badea Liviu
Berger Sacha
Bry François
Furche Tim
Koch Christoph
Schaffert Sebastian
Publication venue
Publication date: 15/08/2004
Field of study

This report identifies those design principles for a (possibly new) query and transformation language for the Web supporting inference that are considered essential. Based upon these design principles an initial strawman is selected. Scenarios for querying the Semantic Web illustrate the design principles and their reflection in the initial strawman, i.e., a first draft of the query language to be designed and implemented by the REWERSE working group I4

Open Access LMU

VisTrails: enabling interactive multiple-view visualizations

Author: Bavoil Louis
Freire Juliana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Journal ArticleVisTrails is a new system that enables interactive multiple-view visualizations by simplifying the creation and maintenance of visualization pipelines, and by optimizing their execution. It provides a general infrastructure that can be combined with existing visualization systems and libraries. A key component of VisTrails is the visualization trail (vistrail), a formal specification of a pipeline. Unlike existing dataflow-based systems, in VisTrails there is a clear separation between the specification of a pipeline and its execution instances. This separation enables powerful scripting capabilities and provides a scalable mechanism for generating a large number of visualizations. VisTrails also leverages the vistrail specification to identify and avoid redundant operations. This optimization is especially useful while exploring multiple visualizations. When variations of the same pipeline need to be executed, substantial speedups can be obtained by caching the results of overlapping subsequences of the pipelines. In this paper, we describe the design and implementation of VisTrails, and show its effectiveness in different application scenarios

The University of Utah: J. Willard Marriott Digital Library

A Comparative Analysis of ASCII and XML Logging Systems

Author: Hanington Eric C.
Publication venue: AFIT Scholar
Publication date: 10/09/2010
Field of study

This research compares XML and ASCII based event logging systems in terms of their storage and processing efficiency. XML has been an emerging technology, even for security. Therefore, it is researched as a logging system with the mitigation of its verbosity. Each system consists of source content, the network transmission, database storage, and querying which are all studied as individual parts. The ASCII logging system consists of the text file as source, FTP as transport, and a relational database system for storage and querying. The XML system has the XML files and XML files in binary form using Efficient XML Interchange encoding, FTP as transport using both XML and binary XML, and an XML database for storage and querying. Further comparisons are made between the XML itself and binary XML, as well as binary XML to ASCII text when comparing file sizes and transmission efficiency. XML itself is a poor choice for hard drive and network transport time compared to ASCII. However, in a binary form, it uses less hard drive space and network resources. Because no XML databases support a binary XML, it is loaded without any optimization. The ASCII loads into the relational database with less time than XML into its database. However, querying each database, neither outperforms the other as one query results in shorter time for one, and another query results in a shorter time for the other. Therefore, XML and/or its binary form, is a viable candidate for use as a comprehensive logging system

AFTI Scholar (Air Force Institute of Technology)

Big Data Analytics for Earth Sciences: the EarthServer approach

Author: Barbera R.
Barboni D.
Baumann P.
Beccati A.
Bigagli L.
Boldrini E.
Bruno R.
Calanducci A.
Campalani P.
Clements O.
Dumitru A.
Grant M.
Herzig P.
Kakaletris G.
Koltsida P.
Laxton J.
Lipskoch K.
Mahdiraji A. R.
Mantovani S.
Mazzetti P.
Merticariu V.
Messina A.
Misev D.
Natali S.
Nativi S.
Oosthoek J.
Pappalardo M.
Passmore J.
Rossi A. P.
Rundo F.
Sen M.
Sorbera V.
Sullivan D.
Torrisi M.
Trovato L.
Ungar J.
Veratelli M. G.
Wagner S.
Publication venue
Publication date: 02/03/2015
Field of study

Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains

Open Access Repository

Extensible metadata repository for information systems

Author: Pereira Pedro Honrado Rio
Publication venue: FCT - UNL
Publication date: 01/01/2009
Field of study

Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfillment of the requirements for the degree of Master in Computer ScienceInformation Systems are, usually, systems that have a strong integration component and some of those systems rely on integration solutions that are based on metadata (data that describes data). In that situation, there’s a need to deal with metadata as if it were “normal”information. For that matter, the existence of a metadata repository that deals with the integrity, storage, validity and eases the processes of information integration in the information system is a wise choice. There are several metadata repositories available in the market, but none of them is prepared to deal with the needs of information systems or is generic enough to deal with the multitude of situations/domains of information and with the necessary integration features. In the SESS project (an European Space Agency project), a generic metadata repository was developed, based on XML technologies. This repository provided the tools for information integration, validity, storage, share, import, as well as system and data integration, but it required the use of fix syntactic rules that were stored in the content of the XML files. This situation causes severe problems when trying to import documents from external data sources (sources unaware of these syntactic rules). In this thesis a metadata repository that provided the same mechanisms of storage, integrity, validity, etc, but is specially focused on easy integration of metadata from any type of external source (in XML format) and provides an environment that simplifies the reuse of already existing types of metadata to build new types of metadata, all this without having to modify the documents it stores was developed. The repository stores XML documents (known as Instances), which are instances of a Concept, that Concept defines a XML structure that validates its Instances. To deal with reuse, a special unit named Fragment, which allows defining a XML structure (which can be created by composing other Fragments) that can be reused by Concepts when defining their own structure. Elements of the repository (Instances,Concepts and Fragment) have an identifier based on (and compatible with) URIs, named Metadata Repository Identifier (MRI). Those identifiers, as well as management information(including relations) are managed by the repository, without the need to use fix syntactic rules, easing integration. A set of tests using documents from the SESS project and from software-house ITDS was used to successfully validate the repository against the thesis objectives of easy integration and promotion of reuse

Repositório da Universidade Nova de Lisboa

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining

Author: A Bachmann
A Hindle
A Hindle
A Telea
A Zaidman
AE Hassan
Andy Zaidman
Arie van Deursen
B Adams
B Cornelissen
B Fluri
B Luijten
B Rompaey Van
B Rompaey Van
Bart Van Rompaey
D Beyer
D Beyer
E Gamma
E Maximilien
F Rysselberghe Van
G Meszaros
H Gall
H Kagdi
H Kagdi
H Zhu
J Bible
J Ratzinger
J Wu
K Beck
K Beck
K Schwaber
L Moonen
L Voinea
L Voinea
L Voinea
M Bruntink
M D’Ambros
M D’Ambros
M Fewster
M Gaelli
M Godfrey
M Lanza
M Lehman
M Siniaalto
MA Storey
P Runeson
P Runeson
R Binder
R Sangwan
S Berner
S Bouktif
S Breu
S Demeyer
S Elbaum
SA Slaughter
Serge Demeyer
T Ball
T Bhat
T Gîrba
T Kanstrén
T Mens
T Yamaura
T Zimmermann
Z Lubsen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref