5,809 research outputs found
Storing RDF as a Graph
RDF is the first W3C standard for enriching information resources of the Web with detailed meta data. The semantics of RDF data is defined using a RDF schema. The most expressive language for querying RDF is RQL, which enables querying of semantics. In order to support RQL, a RDF storage system has to map the RDF graph model onto its storage structure. Several storage systems for RDF data have been developed, which store the RDF data as triples in a relational database. To evaluate an RQL query on those triple structures, the graph model has to be rebuilt from the triples.
In this paper, we presented a new approach to store RDF data as a graph in a object-oriented database. Our approach avoids the costly rebuilding of the graph and efficiently queries the storage structure directly. The advantages of our approach have been shown by performance test on our prototype implementation OO-Store
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
Hybrid XML Data Model Architecture for Efficient Document Management
XML has been known as a document standard in representation and exchange of data on the Internet, and is also used as a standard language for the search and reuse of scattered documents on the Internet. The issues related to XML are how to model data on effective and efficient management of semi-structured data and how to actually store the modeled data when implementing a XML contents management system. Previous researches on XML have limitations in (1) reproduction of XML documents from the stored data, (2) retrieval of XML sub-graph from search, (3) supporting only top-down search, not full-search, and (4) dependency of data structure on XML documents. The purpose of this paper is to present a hybrid XML data model architecture for the storage and search of XML document data. By representing both data and structure views of XML documents, this new XML data model technique overcomes the limitations of previous researches on data model for XML documents as well as the existing database systems such as relational and object-oriented data model
A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases
PhDData stored in individual autonomous databases often needs to be combined and
interrelated. For example, in the Inland Water (IW) environment monitoring domain,
the spatial and temporal variation of measurements of different water quality indicators
stored in different databases are of interest. Data from multiple data sources is more
complex to combine when there is a lack of metadata in a computation forin and when
the syntax and semantics of the stored data models are heterogeneous. The main types
of information retrieval (IR) requirements are query transparency and data
harmonisation for data interoperability and support for multiple user views. A
combined Semantic Web based and Agent based distributed system framework has
been developed to support the above IR requirements. It has been implemented using
the Jena ontology and JADE agent toolkits. The semantic part supports the
interoperability of autonomous data sources by merging their intensional data, using a
Global-As-View or GAV approach, into a global semantic model, represented in
DAML+OIL and in OWL. This is used to mediate between different local database
views. The agent part provides the semantic services to import, align and parse
semantic metadata instances, to support data mediation and to reason about data
mappings during alignment. The framework has applied to support information
retrieval, interoperability and multi-lateral viewpoints for four European environmental
agency databases.
An extended GAV approach has been developed and applied to handle queries that can
be reformulated over multiple user views of the stored data. This allows users to
retrieve data in a conceptualisation that is better suited to them rather than to have to
understand the entire detailed global view conceptualisation. User viewpoints are
derived from the global ontology or existing viewpoints of it. This has the advantage
that it reduces the number of potential conceptualisations and their associated
mappings to be more computationally manageable. Whereas an ad hoc framework
based upon conventional distributed programming language and a rule framework
could be used to support user views and adaptation to user views, a more formal
framework has the benefit in that it can support reasoning about the consistency,
equivalence, containment and conflict resolution when traversing data models. A
preliminary formulation of the formal model has been undertaken and is based upon
extending a Datalog type algebra with hierarchical, attribute and instance value
operators. These operators can be applied to support compositional mapping and
consistency checking of data views. The multiple viewpoint system was implemented
as a Java-based application consisting of two sub-systems, one for viewpoint
adaptation and management, the other for query processing and query result
adjustment
Collaborative recommendations with content-based filters for cultural activities via a scalable event distribution platform
Nowadays, most people have limited leisure time and the offer of (cultural) activities to spend this time is enormous. Consequently, picking the most appropriate events becomes increasingly difficult for end-users. This complexity of choice reinforces the necessity of filtering systems that assist users in finding and selecting relevant events. Whereas traditional filtering tools enable e.g. the use of keyword-based or filtered searches, innovative recommender systems draw on user ratings, preferences, and metadata describing the events. Existing collaborative recommendation techniques, developed for suggesting web-shop products or audio-visual content, have difficulties with sparse rating data and can not cope at all with event-specific restrictions like availability, time, and location. Moreover, aggregating, enriching, and distributing these events are additional requisites for an optimal communication channel. In this paper, we propose a highly-scalable event recommendation platform which considers event-specific characteristics. Personal suggestions are generated by an advanced collaborative filtering algorithm, which is more robust on sparse data by extending user profiles with presumable future consumptions. The events, which are described using an RDF/OWL representation of the EventsML-G2 standard, are categorized and enriched via smart indexing and open linked data sets. This metadata model enables additional content-based filters, which consider event-specific characteristics, on the recommendation list. The integration of these different functionalities is realized by a scalable and extendable bus architecture. Finally, focus group conversations were organized with external experts, cultural mediators, and potential end-users to evaluate the event distribution platform and investigate the possible added value of recommendations for cultural participation
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Integration of Heterogeneous Digital Libraries with Semi-automatic Mapping and Browsing: From Formalization to Specification to Visualization
In this paper, we formalize the digital library (DL) integration problem and propose an overall approach based on the 5S framework. We apply 5S to domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. We use ETANA-DL as a case study to describe the process of semi-automatically generating a union catalog and a unified browsing service in an archaeological DL. A visual schema mapping tool is developed for union catalog creation. A pilot user study aids tool evaluation. Our approach is further validated through application of a general browsing component to two integrated DLs
Techniques for improving efficiency and scalability for the integration of information retrieval and databases
PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with
particular focuses on improving efficiency and scalability of integrated IR and DB technology
(IR+DB). The main purpose of this study is to develop efficient and scalable techniques for
supporting integrated IR and DB technology, which is a popular approach today for handling
complex queries over text and structured data.
Our specific interest in this thesis is how to efficiently handle queries over large-scale text
and structured data. The work is based on a technology that integrates probability theory and
relational algebra, where retrievals for text and data are to be expressed in probabilistic logical
programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient
processing of probabilistic logical programs, we proposed three optimization techniques
that focus on aspects covered logical and physical layers, which include: scoring-driven query
optimization using scoring expression, query processing with top-k incorporated pipeline, and
indexing with relational inverted index.
Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics
of implied scoring functions of PRA expressions, so that efficient query execution plan
can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and
effectiveness so that to improve query response time, we studied methods for incorporating topk
algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed
relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which
can be used to support efficient probability estimation and aggregation as well as conventional
relational operations.
Experiments were carried out to investigate the performances of proposed techniques. Experimental
results showed that the efficiency and scalability of an IR+DB prototype have been
improved, while the system can handle queries efficiently on considerable large data sets for a
number of IR tasks
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
- …