35,992 research outputs found
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Managing data through the lens of an ontology
Ontology-based data management aims at managing data through the lens of an ontology, that is, a conceptual representation of the domain of interest in the underlying information system. This new paradigm provides several interesting features, many of which have already been proved effective in managing complex information systems. This article introduces the notion of ontology-based data management, illustrating the main ideas underlying the paradigm, and pointing out the importance of knowledge representation and automated reasoning for addressing the technical challenges it introduces
An embodied conversational agent for intelligent web interaction on pandemic crisis communication
In times of crisis, an effective communication mechanism is paramount in providing accurate and timely information to the community. In this paper we study the use of an intelligent embodied conversational agent (EGA) as the front end interface with the public for a Crisis Communication Network Portal (CCNet). The proposed system, CCNet, is an integration of the intelligent conversation agent, AINI, and an Automated Knowledge Extraction Agent (AKEA). AKEA retrieves first hand information from relevant sources such as government departments and news channels. In this paper, we compare the interaction of AINI against two popular search engines, two question answering systems and two conversational systems
Ontology-based Classification and Analysis of non- emergency Smart-city Events
Several challenges are faced by citizens of urban centers while dealing with
day-to-day events, and the absence of a centralised reporting mechanism makes
event-reporting and redressal a daunting task. With the push on information
technology to adapt to the needs of smart-cities and integrate urban civic
services, the use of Open311 architecture presents an interesting solution. In
this paper, we present a novel approach that uses an existing Open311 ontology
to classify and report non-emergency city-events, as well as to guide the
citizen to the points of redressal. The use of linked open data and the
semantic model serves to provide contextual meaning and make vast amounts of
content hyper-connected and easily-searchable. Such a one-size-fits-all model
also ensures reusability and effective visualisation and analysis of data
across several cities. By integrating urban services across various civic
bodies, the proposed approach provides a single endpoint to the citizen, which
is imperative for smooth functioning of smart cities
PlanetOnto: from news publishing to integrated knowledge management support
Given a scenario in which members of an academic community collaboratively construct and share an archive of news items, several knowledge management challenges arise. The authors' integrated suite of tools, called PlanetOnto, supports a speedy but high quality publishing process, allows ontology-driven document formalization and augments standard browsing and search facilities with deductive knowledge retrieva
An infrastructure for building semantic web portals
In this paper, we present our KMi semantic web portal infrastructure, which supports two important tasks of semantic web portals, namely metadata extraction and data querying. Central to our infrastructure are three components: i) an automated metadata extraction tool, ASDI, which supports the extraction of high quality metadata from heterogeneous sources, ii) an ontology-driven question answering tool, AquaLog, which makes use of the domain specific ontology and the semantic metadata extracted by ASDI to answers questions in natural language format, and iii) a semantic search engine, which enhances traditional
text-based searching by making use of the underlying ontologies and the extracted metadata. A semantic web portal application has been built, which illustrates the usage of this infrastructure
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
- …