5,432 research outputs found
Data integration through service-based mediation for web-enabled information systems
The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty
arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its
representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity
arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology
framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process
Huddl: the Hydrographic Universal Data Description Language
Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access
HUDDL for description and archive of hydrographic binary data
Many of the attempts to introduce a universal hydrographic binary data format have failed or have been only partially successful. In essence, this is because such formats either have to simplify the data to such an extent that they only support the lowest common subset of all the formats covered, or they attempt to be a superset of all formats and quickly become cumbersome. Neither choice works well in practice. This paper presents a different approach: a standardized description of (past, present, and future) data formats using the Hydrographic Universal Data Description Language (HUDDL), a descriptive language implemented using the Extensible Markup Language (XML). That is, XML is used to provide a structural and physical description of a data format, rather than the content of a particular file. Done correctly, this opens the possibility of automatically generating both multi-language data parsers and documentation for format specification based on their HUDDL descriptions, as well as providing easy version control of them. This solution also provides a powerful approach for archiving a structural description of data along with the data, so that binary data will be easy to access in the future. Intending to provide a relatively low-effort solution to index the wide range of existing formats, we suggest the creation of a catalogue of format descriptions, each of them capturing the logical and physical specifications for a given data format (with its subsequent upgrades). A C/C++ parser code generator is used as an example prototype of one of the possible advantages of the adoption of such a hydrographic data format catalogue
LCG MCDB -- a Knowledgebase of Monte Carlo Simulated Events
In this paper we report on LCG Monte Carlo Data Base (MCDB) and software
which has been developed to operate MCDB. The main purpose of the LCG MCDB
project is to provide a storage and documentation system for sophisticated
event samples simulated for the LHC collaborations by experts. In many cases,
the modern Monte Carlo simulation of physical processes requires expert
knowledge in Monte Carlo generators or significant amount of CPU time to
produce the events. MCDB is a knowledgebase mainly dedicated to accumulate
simulated events of this type. The main motivation behind LCG MCDB is to make
the sophisticated MC event samples available for various physical groups. All
the data from MCDB is accessible in several convenient ways. LCG MCDB is being
developed within the CERN LCG Application Area Simulation project
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Mediated data integration and transformation for web service-based software architectures
Service-oriented architecture using XML-based web services has been widely accepted by many organisations as the standard infrastructure to integrate heterogeneous and autonomous data sources. As a result, many Web service providers are built up on top of the data sources to share the data by supporting provided and required interfaces and methods of data access in a unified manner. In the context of data integration, problems arise when Web services are assembled to deliver an integrated view of data, adaptable to the specific needs of individual clients and providers. Traditional approaches of data integration and transformation are not suitable to automate the construction of connectors dedicated to connect selected Web services to render integrated and tailored views of data. We propose a declarative approach that addresses the oftenneglected data integration and adaptivity aspects of serviceoriented
architecture
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)
We describe a binding schema markup language (BSML) for describing data
interchange between scientific codes. Such a facility is an important
constituent of scientific problem solving environments (PSEs). BSML is designed
to integrate with a PSE or application composition system that views model
specification and execution as a problem of managing semistructured data. The
data interchange problem is addressed by three techniques for processing
semistructured data: validation, binding, and conversion. We present BSML and
describe its application to a PSE for wireless communications system design
Automatic visualization and control of arbitrary numerical simulations
Authorsâ preprint version as submitted to ECCOMAS Congress 2016, Minisymposium 505 - Interactive Simulations in Computational Engineering. Abstract: Visualization of numerical simulation data has become a cornerstone for many industries and research areas today. There exists a large amount of software support, which is usually tied to specific problem domains or simulation platforms. However, numerical simulations have commonalities in the building blocks of their descriptions (e. g., dimensionality, range constraints, sample frequency). Instead of encoding these descriptions and their meaning into software architecures we propose to base their interpretation and evaluation on a data-centric model. This approach draws much inspiration from work of the IEEE Simulation Interoperability Standards Group as currently applied in distributed (military) training and simulation scenarios and seeks to extend those ideas. By using an extensible self-describing protocol format, simulation users as well as simulation-code providers would be able to express the meaning of their data even if no access to the underlying source code was available or if new and unforseen use cases emerge. A protocol definition will allow simulation-domain experts to describe constraints that can be used for automatically creating appropriate visualizations of simulation data and control interfaces. Potentially, this will enable leveraging innovations on both the simulation and visualization side of the problem continuum. We envision the design and development of algorithms and software tools for the automatic visualization of complex data from numerical simulations executed on a wide variety of platforms (e. g., remote HPC systems, local many-core or GPU-based systems). We also envisage using this automatically gathered information to control (or steer) the simulation while it is running, as well as providing the ability for fine-tuning representational aspects of the visualizations produced
- âŠ