55,455 research outputs found
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Exploring Design Space For An Integrated Intelligent System
Understanding the trade-offs available in the design space of intelligent systems is a major unaddressed element in the study of Artificial Intelligence. In this paper we approach this problem in two ways. First, we discuss the development of our integrated robotic system in terms of its trajectory through design space. Second, we demonstrate the practical implications of architectural design decisions by using this system as an experimental platform for comparing behaviourally similar yet architecturally different systems. The results of this show that our system occupies a "sweet spot" in design space in terms of the cost of moving information between processing components
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
Shingle 2.0: generalising self-consistent and automated domain discretisation for multi-scale geophysical models
The approaches taken to describe and develop spatial discretisations of the
domains required for geophysical simulation models are commonly ad hoc, model
or application specific and under-documented. This is particularly acute for
simulation models that are flexible in their use of multi-scale, anisotropic,
fully unstructured meshes where a relatively large number of heterogeneous
parameters are required to constrain their full description. As a consequence,
it can be difficult to reproduce simulations, ensure a provenance in model data
handling and initialisation, and a challenge to conduct model intercomparisons
rigorously. This paper takes a novel approach to spatial discretisation,
considering it much like a numerical simulation model problem of its own. It
introduces a generalised, extensible, self-documenting approach to carefully
describe, and necessarily fully, the constraints over the heterogeneous
parameter space that determine how a domain is spatially discretised. This
additionally provides a method to accurately record these constraints, using
high-level natural language based abstractions, that enables full accounts of
provenance, sharing and distribution. Together with this description, a
generalised consistent approach to unstructured mesh generation for geophysical
models is developed, that is automated, robust and repeatable, quick-to-draft,
rigorously verified and consistent to the source data throughout. This
interprets the description above to execute a self-consistent spatial
discretisation process, which is automatically validated to expected discrete
characteristics and metrics.Comment: 18 pages, 10 figures, 1 table. Submitted for publication and under
revie
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Understanding Visualization: A formal approach using category theory and semiotics
This article combines the vocabulary of semiotics and category theory to provide a formal analysis of visualization. It shows how familiar processes of visualization fit the semiotic frameworks of both Saussure and Peirce, and extends these structures using the tools of category theory to provide a general framework for understanding visualization in practice, including: relationships between systems, data collected from those systems, renderings of those data in the form of representations, the reading of those representations to create visualizations, and the use of those visualizations to create knowledge and understanding of the system under inspection. The resulting framework is validated by demonstrating how familiar information visualization concepts (such as literalness, sensitivity, redundancy, ambiguity, generalizability, and chart junk) arise naturally from it and can be defined formally and precisely. This article generalizes previous work on the formal characterization of visualization by, inter alia, Ziemkiewicz and Kosara and allows us to formally distinguish properties of the visualization process that previous work does not
The effect of background knowledge on young children's comprehension of explicit and implicit information
Bibliography: leaves 15-16Supported in part by the National Institute of Educatio
Recommended from our members
Learning from AI : new trends in database technology
Recently some researchers in the areas of database data modelling and knowledge representations in artificial intelligence have recognized that they share many common goals. In this survey paper we show the relationship between database and artificial intelligence research. We show that there has been a tendency for data models to incorporate more modelling techniques developed for knowledge representations in artificial intelligence as the desire to incorporate more application oriented semantics, user friendliness, and flexibility has increased. Increasing the semantics of the representation is the key to capturing the "reality" of the database environment, increasing user friendliness, and facilitating the support of multiple, possibly conflicting, user views of the information contained in a database
- …