23,505 research outputs found
Development of Use Cases, Part I
For determining requirements and constructs appropriate for a Web query language, or in fact
any language, use cases are of essence. The W3C has published two sets of use cases for XML
and RDF query languages. In this article, solutions for these use cases are presented using
Xcerpt. a novel Web and Semantic Web query language that combines access to standard Web
data such as XML documents with access to Semantic Web metadata
such as RDF resource
descriptions with reasoning abilities and rules familiar from logicprogramming.
To the
best knowledge of the authors, this is the first in depth study of how to solve use cases for
accessing XML and RDF in a single language: Integrated access to data and metadata
has been
recognized by industry and academia as one of the key challenges in data processing for the
next decade. This article is a contribution towards addressing this challenge by demonstrating
along practical and recognized use cases the usefulness of reasoning abilities, rules, and
semistructured
query languages for accessing both data (XML) and metadata
(RDF)
Development of Use Cases, Part I
For determining requirements and constructs appropriate for a Web query language, or in fact
any language, use cases are of essence. The W3C has published two sets of use cases for XML
and RDF query languages. In this article, solutions for these use cases are presented using
Xcerpt. a novel Web and Semantic Web query language that combines access to standard Web
data such as XML documents with access to Semantic Web metadata
such as RDF resource
descriptions with reasoning abilities and rules familiar from logicprogramming.
To the
best knowledge of the authors, this is the first in depth study of how to solve use cases for
accessing XML and RDF in a single language: Integrated access to data and metadata
has been
recognized by industry and academia as one of the key challenges in data processing for the
next decade. This article is a contribution towards addressing this challenge by demonstrating
along practical and recognized use cases the usefulness of reasoning abilities, rules, and
semistructured
query languages for accessing both data (XML) and metadata
(RDF)
HUDDL for description and archive of hydrographic binary data
Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue
Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)
Since Internet was born most people can access fully free to a lot sources of information. Every day a lot of web pages are created and new content is uploaded and shared. Never in the history the humans has been more informed but also uninformed due the huge amount of information that can be access. When we are looking for something in any search engine the results are too many for reading and filtering one by one. Recommended Systems (RS) was created to help us to discriminate and filter these information according to ours preferences. This contribution analyses the RS of the official agency of publications in Spain (BOE), which is known as "Mi BOE'. The way this RS works was analysed, and all the meta-data of the published documents were analysed in order to know the coverage of the system. The results of our analysis show that more than 89% of the documents cannot be recommended, because they are not well described at the documentary level, some of their key meta-data are empty. So, this contribution proposes a method to label documents automatically based on Latent Dirichlet Allocation (LDA). The results are that using this approach the system could recommend (at a theoretical point of view) more than twice of documents that it now does, 11% vs 23% after applied this approach
A Bootstrap Theory: the SEMAT Kernel Itself as Runnable Software
The SEMAT kernel is a thoroughly thought generic framework for Software
Engineering system development in practice. But one should be able to test its
characteristics by means of a no less generic theory matching the SEMAT kernel.
This paper claims that such a matching theory is attainable and describes its
main principles. The conceptual starting point is the robustness of the Kernel
alphas to variations in the nature of the software system, viz. to software
automation, distribution and self-evolution. From these and from observed
Kernel properties follows the proposed bootstrap principle: a software system
theory should itself be a runnable software. Thus, the kernel alphas can be
viewed as a top-level ontology, indeed the Essence of Software Engineering.
Among the interesting consequences of this bootstrap theory, the observable
system characteristics can now be formally tested. For instance, one can check
the system completeness, viz. that software system modules fulfill each one of
the system requirements.Comment: 8 pages; 2 figures; Preprint of paper accepted for GTSE'2014
Workshop, within ICSE'2014 Conferenc
Condor services for the Global Grid:interoperability between Condor and OGSA
In order for existing grid middleware to remain viable it is important to investigate their potentialfor integration with emerging grid standards and architectural schemes. The Open Grid ServicesArchitecture (OGSA), developed by the Globus Alliance and based on standard XML-based webservices technology, was the first attempt to identify the architectural components required tomigrate towards standardized global grid service delivery. This paper presents an investigation intothe integration of Condor, a widely adopted and sophisticated high-throughput computing softwarepackage, and OGSA; with the aim of bringing Condor in line with advances in Grid computing andprovide the Grid community with a mature suite of high-throughput computing job and resourcemanagement services. This report identifies mappings between elements of the OGSA and Condorinfrastructures, potential areas of conflict, and defines a set of complementary architectural optionsby which individual Condor services can be exposed as OGSA Grid services, in order to achieve aseamless integration of Condor resources in a standardized grid environment
Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services
In recent years, a variety of digital repository and archival systems have
been developed and adopted. All of these systems aim at hosting a variety of
compound digital assets and at providing tools for storing, managing and
accessing those assets. This paper will focus on the definition of common and
standardized access interfaces that could be deployed across such diverse
digital respository and archival systems. The proposed interfaces are based on
the two formal specifications that have recently emerged from the Digital
Library community: The Open Archive Initiative Protocol for Metadata Harvesting
(OAI-PMH) and the NISO OpenURL Framework for Context-Sensitive Services
(OpenURL Standard). As will be described, the former allows for the retrieval
of batches of XML-based representations of digital assets, while the latter
facilitates the retrieval of disseminations of a specific digital asset or of
one or more of its constituents. The core properties of the proposed interfaces
are explained in terms of the Reference Model for an Open Archival Information
System (OAIS).Comment: Accepted paper for PV 2005 "Ensuring Long-term Preservation and
Adding Value to Scientific and Technical data"
(http://www.ukoln.ac.uk/events/pv-2005/
Automatic visualization and control of arbitrary numerical simulations
Authors’ preprint version as submitted to ECCOMAS Congress 2016, Minisymposium 505 - Interactive Simulations in Computational Engineering. Abstract: Visualization of numerical simulation data has become a cornerstone for many industries and research areas today. There exists a large amount of software support, which is usually tied to specific problem domains or simulation platforms. However, numerical simulations have commonalities in the building blocks of their descriptions (e. g., dimensionality, range constraints, sample frequency). Instead of encoding these descriptions and their meaning into software architecures we propose to base their interpretation and evaluation on a data-centric model. This approach draws much inspiration from work of the IEEE Simulation Interoperability Standards Group as currently applied in distributed (military) training and simulation scenarios and seeks to extend those ideas. By using an extensible self-describing protocol format, simulation users as well as simulation-code providers would be able to express the meaning of their data even if no access to the underlying source code was available or if new and unforseen use cases emerge. A protocol definition will allow simulation-domain experts to describe constraints that can be used for automatically creating appropriate visualizations of simulation data and control interfaces. Potentially, this will enable leveraging innovations on both the simulation and visualization side of the problem continuum. We envision the design and development of algorithms and software tools for the automatic visualization of complex data from numerical simulations executed on a wide variety of platforms (e. g., remote HPC systems, local many-core or GPU-based systems). We also envisage using this automatically gathered information to control (or steer) the simulation while it is running, as well as providing the ability for fine-tuning representational aspects of the visualizations produced
- …