10 research outputs found

    Optimized Seamless Integration of Biomolecular Data

    Get PDF
    Today, scientific data is inevitably digitized, stored in a wide variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building such a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans. (Also referenced as UMIACS-TR-2001-51

    A semantic data federation engine : design, implementation & applications in educational information management

    Get PDF
    Thesis (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program; and, (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 87-90).With the advent of the World Wide Web, the amount of digital information in the world has increased exponentially. The ability to organize this deluge of data, retrieve it, and combine it with other data would bring numerous benefits to organizations that rely on the analysis of this data for their operations. The Semantic Web encompasses various technologies that support better information organization and access. This thesis proposes a data federation engine that facilitates integration of data across distributed Semantic Web data sources while maintaining appropriate access policies. After discussing existing literature in the field, the design and implementation of the system including its capabilities and limitations are thoroughly described. Moreover, a possible application of the system at the Massachusetts Department of Education is explored in detail, including an investigation of the technical and nontechnical challenges associated with its adoption at a government agency. By using the federation engine, users would be able to exploit the expressivity of the Semantic Web by querying for disparate data at a single location without having to know how it is distributed or where it is stored. Among this research's contributions to the fledgling Semantic Web are: an integrated system for executing SPARQL queries; and, an optimizer that faciliates efficient querying by exploiting statistical information about the data sources.by Mathew Sam Cherian.S.M.S.M.in Technology and Polic

    WISM'07 : 4th international workshop on web information systems modeling

    Get PDF

    WISM'07 : 4th international workshop on web information systems modeling

    Get PDF

    Intelligent integration of information : design and implementation of a web wrapper

    Get PDF
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 51-52).by Kenneth C. Lau.S.B.and M.Eng

    Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

    Get PDF
    Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

    Technological roadmap on AI planning and scheduling

    Get PDF
    At the beginning of the new century, Information Technologies had become basic and indispensable constituents of the production and preparation processes for all kinds of goods and services and with that are largely influencing both the working and private life of nearly every citizen. This development will continue and even further grow with the continually increasing use of the Internet in production, business, science, education, and everyday societal and private undertaking. Recent years have shown, however, that a dramatic enhancement of software capabilities is required, when aiming to continuously provide advanced and competitive products and services in all these fast developing sectors. It includes the development of intelligent systems – systems that are more autonomous, flexible, and robust than today’s conventional software. Intelligent Planning and Scheduling is a key enabling technology for intelligent systems. It has been developed and matured over the last three decades and has successfully been employed for a variety of applications in commerce, industry, education, medicine, public transport, defense, and government. This document reviews the state-of-the-art in key application and technical areas of Intelligent Planning and Scheduling. It identifies the most important research, development, and technology transfer efforts required in the coming 3 to 10 years and shows the way forward to meet these challenges in the short-, medium- and longer-term future. The roadmap has been developed under the regime of PLANET – the European Network of Excellence in AI Planning. This network, established by the European Commission in 1998, is the co-ordinating framework for research, development, and technology transfer in the field of Intelligent Planning and Scheduling in Europe. A large number of people have contributed to this document including the members of PLANET non- European international experts, and a number of independent expert peer reviewers. All of them are acknowledged in a separate section of this document. Intelligent Planning and Scheduling is a far-reaching technology. Accepting the challenges and progressing along the directions pointed out in this roadmap will enable a new generation of intelligent application systems in a wide variety of industrial, commercial, public, and private sectors
    corecore