79 research outputs found

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

    Personalizing Interactions with Information Systems

    Get PDF
    Personalization constitutes the mechanisms and technologies necessary to customize information access to the end-user. It can be defined as the automatic adjustment of information content, structure, and presentation tailored to the individual. In this chapter, we study personalization from the viewpoint of personalizing interaction. The survey covers mechanisms for information-finding on the web, advanced information retrieval systems, dialog-based applications, and mobile access paradigms. Specific emphasis is placed on studying how users interact with an information system and how the system can encourage and foster interaction. This helps bring out the role of the personalization system as a facilitator which reconciles the user’s mental model with the underlying information system’s organization. Three tiers of personalization systems are presented, paying careful attention to interaction considerations. These tiers show how progressive levels of sophistication in interaction can be achieved. The chapter also surveys systems support technologies and niche application domains

    A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data

    Get PDF
    The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe

    Accessing and using complex multimedia documents in a digital library

    Get PDF
    Dans le cadre de trois projets européens, notre équipe a mis au point un modèle de données et un langage de requête pour bibliothèques numériques supportant l'identification, la structuration, les métadonnées, la réutilisation, et la découverte des ressources numériques. Le modèle proposé est inspiré par le Web et il est formalisé comme une théorie du premier ordre, dont certains modèles correspondent à la notion de bibliothèque numérique. En outre, une traduction complète du modèle en RDF et du langage de requêtes en SPARQL a également été proposée pour démontrer son adéquation à des applications pratiques. Le choix de RDF est dû au fait qu il est un langage de représentation généralement accepté dans le cadre des bibliothèques numériques et du Web sémantique. L objectif de cette thèse était double: concevoir et mettre en œuvre une forme simplifiée de système de gestion de bibliothèques numériques, d une part, et contribuer à l enrichissement du modèle, d autre part. Pour atteindre cet objectif nous avons développé un prototype d un système de bibliothèque numérique utilisant un stockage RDF pour faciliter la gestion interne des métadonnées. Le prototype permet aux utilisateurs de gérer et d interroger les métadonnées des ressources numériques ou non-numériques dans le système en utilisant des URIs pour identifier les ressources, un ensemble de prédicats pour la description de ressources, et des requêtes conjonctives simples pour la découverte de connaissances dans le système. Le prototype est mis en œuvre en utilisant les technologies Java et l environnement de Google Web Toolkit dont l'architecture du système se compose d'une couche de stockage, d une couche de métier logique, d une couche de service, et d une interface utilisateur. Pendant la thèse, le prototype a été construit, testé et débogué localement, puis déployé sur Google App Engine. Dans l avenir, il peut être étendu pour devenir un système complet de gestion de bibliothèques numériques. Par ailleurs, la thèse présente également notre contribution à la génération de contenu par réutilisation de ressources. Il s agit d un travail théorique dont le but est d enrichir le modèle en lui ajoutant un service important, à savoir la possibilité de création de nouvelles ressources à partir de celles stockées dans le système. L incorporation de ce service dans le système sera effectuée ultérieurement.In the context of three European projects, our research team has developed a data model and query language for digital libraries supporting identification, structuring, metadata, and discovery and reuse of digital resources. The model is inspired by the Web and it is formalized as a first-order theory, certain models of which correspond to the notion of digital library. In addition, a full translation of the model to RDF and of the query language to SPARQL has been proposed to demonstrate the feasibility of the model and its suitability for practical applications. The choice of RDF is due to the fact that it is a generally accepted representation language in the context of digital libraries and the Semantic Web. One of the major aims of the thesis was to design and actually implement a simplified form of a digital library management system based on the theoretical model. To obtain this, we have developed a prototype based on RDF and SPARQL, which uses a RDF store to facilitate internal management of metadata. The prototype allows users to manage and query metadata of digital or non-digital resources in the system, using URIs as resource identifiers, a set of predicates to model descriptions of resources, and simple conjunctive queries to discover knowledge in the system. The prototype is implemented by using Java technologies and the Google Web Toolkit framework whose system architecture consists of a storage layer, a business logic layer, a service layer and a user interface. During the thesis work, the prototype was built, tested, and debugged locally and then deployed on Google App Engine. In the future, it will be expanded to become a full fledged digital library management system. Moreover, the thesis also presents our contribution to content generation by reuse. This is mostly theoretical work whose purpose is to enrich the model and query language by providing an important community service. The incorporation of this service in the implemented system is left to future work.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

    Finding hidden semantics of text tables

    Get PDF
    Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much information about the table and its content. Important information about the table data is often given in the text surrounding the table in that document. The table's creators cannot clarify all the semantics in the table itself therefore they use the table context or the text around it to give further information. These semantics are very useful when integrating and using this data, but are often difficult to detect automatically. We propose a solution to part of this problem based on a domain ontology. The input to our system is a document that contains tabular data and the system aims to find semantics in the document that are related to the tabular data. The output of our system is a set of detected semantics linked to the corresponding table. The system uses elements of semantic detection, semantic representation, and data integration. Semantic detection uses a domain ontology, in which we store concepts of that domain. This allows us to analyse the content of the document (text) and detect context information about the tables present in a document containing tabular data. Our approach consists of two components: (1) extract, from the domain ontology, concepts, synonyms, and relations that correspond to the table data. (2) Build a tree for the paragraphs and use this tree to detect the hidden semantics by searching for words matching the extracted concepts. Semantic representation techniques then allow representation of the detected semantics of the table data. Our system represents the detected semantics, as either 'semantic units' or 'enhanced metadata'. Semantic units are a flexible set of meta-attributes that describe the meaning of the data item along with the detected semantics. In addition, each semantic unit has a concept label associated with it that specifies the relationship between the unit and the real world aspects it describes. In the enhanced metadata, table metadata is enhanced with the semantics and representation context found in the text. Integrating data in our proposed system takes place in two steps. First, the semantic units are converted to a common context, reflecting the application. This is achieved by using appropriate conversion functions. Secondly, the semantically identical semantic units, will be identified and integrated into a common representation. This latter is the subject of future work. Thus the research has shown that semantics about a table are in the text and how it is possible to locate and use these semantics by transforming them into an appropriate form to enhance the basic table metadata

    Extensible metadata management framework for personal data lake

    Get PDF
    Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data and are losing control over it. There appears to be a de facto agreement in business and scientific fields that there is critical new value and interesting insight that can be attained by users from analysing their own data, if only it can be freed from its silos and combined with other data in meaningful ways. This thesis takes the point of view that users should have an easy-to-use modern personal data management solution that enables them to centralise and efficiently manage their data by themselves, under their full control, for their best interests, with minimum time and efforts. In that direction, we describe the basic architecture of a management solution that is designed based on solid theoretical foundations and state of the art big data technologies. This solution (called Personal Data Lake - PDL) collects the data of a user from a plurality of heterogeneous personal data sources and stores it into a highly-scalable schema-less storage repository. To simplify the user-experience of PDL, we propose a novel extensible metadata management framework (MMF) that: (i) annotates heterogeneous data with rich lineage and semantic metadata, (ii) exploits the garnered metadata for automating data management workflows in PDL – with extensive focus on data integration, and (iii) facilitates the use and reuse of the stored data for various purposes by querying it on the metadata level either directly by the user or through third party personal analytics services. We first show how the proposed MMF is positioned in PDL architecture, and then describe its principal components. Specifically, we introduce a simple yet effective lineage manager for tracking the provenance of personal data in PDL. We then introduce an ontology-based data integration component called SemLinker which comprises two new algorithms; the first concerns generating graph-based representations to express the native schemas of (semi) structured personal data, and the second algorithm metamodels the extracted representations to a common extensible ontology. SemLinker outputs are utilised by MMF to generate user-tailored unified views that are optimised for querying heterogeneous personal data through low-level SPARQL or high-level SQL-like queries. Next, we introduce an unsupervised automatic keyphrase extraction algorithm called SemCluster that specialises in extracting thematically important keyphrases from unstructured data, and associating each keyphrase with ontological information drawn from an extensible WordNet-based ontology. SemCluster outputs serve as semantic metadata and are utilised by MMF to annotate unstructured contents in PDL, thus enabling various management functionalities such as relationship discovery and semantic search. Finally, we describe how MMF can be utilised to perform holistic integration of personal data and jointly querying it in native representations

    Semantic Web methods for knowledge management [online]

    Get PDF

    Dynamic data retrieval on the world wide web

    Get PDF
    Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold custom wrapper approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client\u27s computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases
    • …
    corecore