184 research outputs found

    Understanding personal data as a space - learning from dataspaces to create linked personal data

    No full text
    In this paper we argue that the space of personal data is a dataspace as defined by Franklin et al. We define a personal dataspace, as the space of all personal data belonging to a user, and we describe the logical components of the dataspace. We describe a Personal Dataspace Support Platform (PDSP) as a set of services to provide a unified view over the user’s data, and to enable new and more complex workflows over it. We show the differences from a DSSP to a PDSP, and how the latter can be realized using Web protocols and Linked APIs.<br/

    BioCloud Search EnGene: Surfing Biological Data on the Cloud

    Get PDF
    The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/

    No users no dataspaces! Query-driven dataspace orchestration

    Get PDF
    Data analysis in rich spaces of heterogeneous data sources is an increasingly common activity. Examples include querying the web of linked data and personal information management. Such analytics on dataspaces is often iterative and dynamic, in an open-ended interaction between discovery and data orchestration. The current state of the art in integration and orchestration in dataspaces is primarily geared towards close-ended analysis, targeting the discovery of stable data mappings or one-time, pay-as-you-go ad hoc data mappings. The perspective here is dataspace-centric. In this paper, we propose a shift to a user-centric perspective on dataspace orchestration. We outline basic conceptual and technical challenges in supporting data analytics which is open-ended and always evolving, as users respond to new discoveries and connections

    Compact Indexes Based on Core Content in Personal Dataspace Management System

    Get PDF
    A Personal DataSpace Management System is a platform to manage personal data with heterogeneous data types, in which keyword query is a primary query form for users who know little about the structure of the dataspace. Unlike exploratory queries in web search, a user in a personal dataspace usually has a specific search target and wants to find some known items in mind. To improve result quality in terms of query relevance in a personal dataspace, we propose the concept of compact index in this paper. We refer to the most important and representative semantics from documents as core content, and build compact index on it. We propose algorithm for selecting core content from a document based on semantic analysis, which can process English and Chinese documents uniformly. Furthermore, a software platform named Versatile is introduced for flexible personal data management, in which core content is extracted for building compact indexes and generating query-biased snippet efficiently and accurately. Finally, extensive experiments have been conducted to show the effectiveness and feasibility of compact indexes in personal dataspace management system

    Information Retrieval and Query Ranking of Unstructured Data in Dataspace using Vector Space Model

    Get PDF
    There is a vast amount of data is available on the web in the form of WebPages, on the clouds or in the repositories of any organization. All data are stored digitally by any companies, enterprises or any organization, these data may be text data, streamed data, images, Facebook data, Twitter data, Videos and other documents available digitally on the Internet related any areas like manufacturing, engineering, medical, etc. collectively called Dataspace. The data available over the internet may be structured data, unstructured or without any format. The storing mechanism is different for each organization but searching and retrieval of data should be easy from the user�s point, they are able to find the relevant information efficiently and accurate information that should be satisfied them, so there should be a proper model, search engine or interface for finding the information. Retrieving information from the Internet and large databases are quite difficult and time-consuming especially if such information is unstructured. Several algorithms and techniques have been developed in the area of data mining and information retrieval yet retrieving data from large databases continue to be problematic. In this paper, the Vector Space Model (VSM) technique of information retrieval is used, by using VSM model documents and queries can be represented as a vector, whose dimension is considered as terms to build the index represent the unstructured data. VSM is widely used for retrieving the documents and data due to its simplicity and efficiency work on a large number of datasets. VSM is based on term weighting on document vectors using three steps 1) First step is used to create indexes of the documents to retrieve the relevant data, 2) In the second step weighting of the indexed terms is used to retrieve the appropriate document for the end user, and (3) In the Finally steps the similarity measures is between documents to rank the documents relevant to the end user query using. The cosine measure is often used. We then found out that it is easier to retrieve data or information based on their similarity measures and produces a better and more efficient technique or model for information retrieval

    Linked Data - the story so far

    No full text
    The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward

    Scalable dataspace construction

    Get PDF
    The conference aimed at supporting and stimulating active productive research set to strengthen the technical foundations of engineers and scientists in the continent, through developing strong technical foundations and skills, leading to new small to medium enterprises within the African sub-continent. It also seeked to encourage the emergence of functionally skilled technocrats within the continent.This paper proposes the design and implementation of scalable dataspaces based on efficient data structures. Dataspaces are often likely to exhibit a multidimensional structure due to the unpredictable neighbour relationship between participants coupled by the continuous exponential growth of data. Layered range trees are incorporated to the proposed solution as multidimensional binary trees which are used to perform d-dimensional orthogonal range indexing and searching. Furthermore, the solution is readily extensible to multiple dimensions, raising the possibility of volume searches and even extension to attribute space. We begin by a study of the important literature and dataspace designs. A scalable design and implementation is further presented. Finally, we conduct experimental evaluation to illustrate the finer performance of proposed techniques. The design of a scalable dataspace is important in order to bridge the gap resulting from the lack of coexistence of data entities in the spatial domain as a key milestone towards pay-as-you-go systems integrationStrathmore University;nstitute of Electrical and Electronics Engineers (IEEE

    Dataspace Support Platform for e-Science

    Get PDF
    This work intends to provide a data management solution based on the concepts of dataspaces for the large-scale and long-term management of scientific data. Our approach is to semantically enrich the existing relationship among primary and derived data items, and to preserve both relationships and data together within a dataspace to be reused by owners and others. To enable reuse, data must be well preserved. Preservation of scientific data can best be established if the full life cycle of data is addressed. This is challenged by the e-Science life cycle ontology, whose major goal is to trace the semantics about procedures in scientific experiments. We present a theoretical dataspace model for e-Science applications, its implementation within a dataspace support platform and an experimental evaluation on top of two real world application domains
    corecore