22,710 research outputs found
Schema architecture and their relationships to transaction processing in distributed database systems
We discuss the different types of schema architectures which could be supported by distributed database systems, making a clear distinction between logical, physical, and federated distribution. We elaborate on the additional mapping information required in architecture based on logical distribution in order to support retrieval as well as update operations. We illustrate the problems in schema integration and data integration in multidatabase systems and discuss their impact on query processing. Finally, we discuss different issues relevant to the cooperation (or noncooperation) of local database systems in a heterogeneous multidatabase system and their relationship to the schema architecture and transaction processing
AutoBayes: A System for Generating Data Analysis Programs from Statistical Models
Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schema-guided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes augments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. It is well-suited for tasks like estimating best-fitting model parameters for the given data. Here, we describe AutoBayes's system architecture, in particular the schema-guided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks
The Future of Human-Artificial Intelligence Nexus and its Environmental Costs
The environmental costs and energy constraints have become emerging issues for the future development of Machine Learning (ML) and Artificial Intelligence (AI). So far, the discussion on environmental impacts of ML/AI lacks a perspective reaching beyond quantitative measurements of the energy-related research costs. Building on the foundations laid down by Schwartz et al., 2019 in the GreenAI initiative, our argument considers two interlinked phenomena, the gratuitous generalisation capability and the future where ML/AI performs the majority of quantifiable inductive inferences. The gratuitous generalisation capability refers to a discrepancy between the cognitive demands of a task to be accomplished and the performance (accuracy) of a used ML/AI model. If the latter exceeds the former because the model was optimised to achieve the best possible accuracy, it becomes inefficient and its operation harmful to the environment. The future dominated by the non-anthropic induction describes a use of ML/AI so all-pervasive that most of the inductive inferences become furnished by ML/AI generalisations. The paper argues that the present debate deserves an expansion connecting the environmental costs of research and ineffective ML/AI uses (the issue of gratuitous generalisation capability) with the (near) future marked by the all-pervasive Human-Artificial Intelligence Nexus
XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments
XML-based communication governs most of today's systems communication, due to
its capability of representing complex structural and hierarchical data.
However, XML document structure is considered a huge and bulky data that can be
reduced to minimize bandwidth usage, transmission time, and maximize
performance. This contributes to a more efficient and utilized resource usage.
In cloud environments, this affects the amount of money the consumer pays.
Several techniques are used to achieve this goal. This paper discusses these
techniques and proposes a new XML Schema-based Minification technique. The
proposed technique works on XML Structure reduction using minification. The
proposed technique provides a separation between the meaningful names and the
underlying minified names, which enhances software/code readability. This
technique is applied to Intrusion Detection Message Exchange Format (IDMEF)
messages, as part of Security Information and Event Management (SIEM) system
communication hosted on Microsoft Azure Cloud. Test results show message size
reduction ranging from 8.15% to 50.34% in the raw message, without using
time-consuming compression techniques. Adding GZip compression to the proposed
technique produces 66.1% shorter message size compared to original XML
messages.Comment: XML, JSON, Minification, XML Schema, Cloud, Log, Communication,
Compression, XMill, GZip, Code Generation, Code Readability, 9 pages, 12
figures, 5 tables, Journal Articl
Knowledge Nodes: the Building Blocks of a Distributed Approach to Knowledge Management
Abstract: In this paper we criticise the objectivistic approach that underlies most current systems for Knowledge Management. We show that such an approach is incompatible with the very nature of what is to be managed (i.e., knowledge), and we argue that this may partially explain why most knowledge management systems are deserted by users. We propose a different approach - called distributed knowledge management - in which subjective and social (in a word, contextual) aspects of knowledge are seriously taken into account. Finally, we present a general technological architecture in which these ideas are implemented by introducing the concept of knowledge node
From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data
We compare two distinct approaches for querying data in the context of the
life sciences. The first approach utilizes conventional databases to store the
data and intuitive form-based interfaces to facilitate easy querying of the
data. These interfaces could be seen as implementing a set of "pre-canned"
queries commonly used by the life science researchers that we study. The second
approach is based on semantic Web technologies and is knowledge (model) driven.
It utilizes a large OWL ontology and same datasets as before but associated as
RDF instances of the ontology concepts. An intuitive interface is provided that
allows the formulation of RDF triples-based queries. Both these approaches are
being used in parallel by a team of cell biologists in their daily research
activities, with the objective of gradually replacing the conventional approach
with the knowledge-driven one. This provides us with a valuable opportunity to
compare and qualitatively evaluate the two approaches. We describe several
benefits of the knowledge-driven approach in comparison to the traditional way
of accessing data, and highlight a few limitations as well. We believe that our
analysis not only explicitly highlights the specific benefits and limitations
of semantic Web technologies in our context but also contributes toward
effective ways of translating a question in a researcher's mind into precise
computational queries with the intent of obtaining effective answers from the
data. While researchers often assume the benefits of semantic Web technologies,
we explicitly illustrate these in practice
- …