12,057 research outputs found

    Knowledge Rich Natural Language Queries over Structured Biological Databases

    Full text link
    Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

    Semantic Storage: Overview and Assessment

    No full text
    The Semantic Web has a great deal of momentum behind it. The promise of a ‘better web’, where information is given well defined meaning and computers are better able to work with it has captured the imagination of a significant number of people, particularly in academia. Language standards such as RDF and OWL have appeared with remarkable speed, and development continues apace. To back up this development, there is a requirement for ‘semantic databases’, where this data can be conveniently stored, operated upon, and retrieved. These already exist in the form of triple stores, but do not yet fulfil all the requirements that may be made of them, particularly in the area of performing inference using OWL. This paper analyses the current stores along with forthcoming technology, and finds that it is unlikely that a combination of speed, scalability, and complex inferencing will be practical in the immediate future. It concludes by suggesting alternative development routes

    vSPARQL: A View Definition Language for the Semantic Web

    Get PDF
    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages

    Designing Traceability into Big Data Systems

    Full text link
    Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    • 

    corecore