10,761 research outputs found
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
SIFTER search: a web server for accurate phylogeny-based protein function prediction.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded
Metarel: an Ontology to support the inferencing of Semantic Web relations within Biomedical Ontologies
While OWL, the Web Ontology Language, is often regarded as the preferred language for Knowledge Representation in the world of the Semantic Web, the potential of direct representation in RDF, the Resource Description Framework, is underestimated. Here we show how ontologies adequately represented in RDF could be semantically enriched with SPARUL. To deal with the semantics of relations we created Metarel, a meta-ontology for relations. The utility of the approach is demonstrated by an application on Gene Ontology Annotation (GOA) RDF graphs in the RDF Knowledge Base BioGateway. We show that Metarel can facilitate inferencing in BioGateway, which allows for queries that are otherwise not possible. Metarel is available on http://www.metarel.org
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures
This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes
Estimating Fire Weather Indices via Semantic Reasoning over Wireless Sensor Network Data Streams
Wildfires are frequent, devastating events in Australia that regularly cause
significant loss of life and widespread property damage. Fire weather indices
are a widely-adopted method for measuring fire danger and they play a
significant role in issuing bushfire warnings and in anticipating demand for
bushfire management resources. Existing systems that calculate fire weather
indices are limited due to low spatial and temporal resolution. Localized
wireless sensor networks, on the other hand, gather continuous sensor data
measuring variables such as air temperature, relative humidity, rainfall and
wind speed at high resolutions. However, using wireless sensor networks to
estimate fire weather indices is a challenge due to data quality issues, lack
of standard data formats and lack of agreement on thresholds and methods for
calculating fire weather indices. Within the scope of this paper, we propose a
standardized approach to calculating Fire Weather Indices (a.k.a. fire danger
ratings) and overcome a number of the challenges by applying Semantic Web
Technologies to the processing of data streams from a wireless sensor network
deployed in the Springbrook region of South East Queensland. This paper
describes the underlying ontologies, the semantic reasoning and the Semantic
Fire Weather Index (SFWI) system that we have developed to enable domain
experts to specify and adapt rules for calculating Fire Weather Indices. We
also describe the Web-based mapping interface that we have developed, that
enables users to improve their understanding of how fire weather indices vary
over time within a particular region.Finally, we discuss our evaluation results
that indicate that the proposed system outperforms state-of-the-art techniques
in terms of accuracy, precision and query performance.Comment: 20pages, 12 figure
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
- …