64,631 research outputs found

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    Database Queries that Explain their Work

    Get PDF
    Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing techniques that support explanation. We state and prove correctness properties for these techniques and present a proof-of-concept implementation in Haskell.Comment: PPDP 201

    Study of various data mining techniques

    Get PDF
    The advent of computing technology has significantly influenced our lives and two major impacts of this effect are Business Data Processing and Scientific Computing. During the initial years of the development of computer techniques for business, computer professionals were concerned with designing files to store the data so that information could be efficiently retrieved. There were restrictions on storage size for storing data and on speed of accessing the data. Needless to say, the activity was restricted to a very few, highly qualified professionals. Then came the era when the task was simplified by a DBMS [1]. The responsibilities of intricate tasks, such as declarative aspects of the program were passed on to the database administrator and the user could pose his query in simpler languages such as query languages

    Federated Query Processing

    Get PDF
    Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area

    The crustal dynamics intelligent user interface anthology

    Get PDF
    The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has, as one of its components, the development of an Intelligent User Interface (IUI). The intent of the IUI is to develop a friendly and intelligent user interface service based on expert systems and natural language processing technologies. The purpose of such a service is to support the large number of potential scientific and engineering users that have need of space and land-related research and technical data, but have little or no experience in query languages or understanding of the information content or architecture of the databases of interest. This document presents the design concepts, development approach and evaluation of the performance of a prototype IUI system for the Crustal Dynamics Project Database, which was developed using a microcomputer-based expert system tool (M. 1), the natural language query processor THEMIS, and the graphics software system GSS. The IUI design is based on a multiple view representation of a database from both the user and database perspective, with intelligent processes to translate between the views

    The design and implementation of a meaning driven data query language

    Get PDF
    We present the design and implementation of a Meaning Driven Data Query Language - MDDQL - which aims at the construction of queries through system made suggestions of natural language based query terms for both scientific application domain terms and operator/operation ones. A query construction blackboard is used where query language terms are suggested to the user in its preferred natural language and in a name centered way, together with their connotation. This helps in understanding the meaning of the terms and/or operators or operations to be included in the query. Furthermore, the construction of the query turns out to be an incremental refinement of the query under construction through semantic constraints, where only those domain language terms and/or operators/operations are suggested which result into meaningful combinations of query terms as related to the scientific application domain semantics. Therefore, semantically meaningless queries can be prevented during the query construction. Such a semantics aware mechanism is not available in conventional database query languages such as SQL, where one is allowed to execute a query calculating, for example, the average of numerical data values whereas they represent the codes of categorical values. Moreover, no familiarity with the semantics of complex database schemes or interpretation of the symbols (names of classes/tables/attributes, value codes) underlying the storage model, as well as familiarity with the syntax of a database specific query language are needed by the end-user. The constructed query can be submitted to the MDDQL query interpretation and transformation engine, where the corresponding SQL-query is generated and delegated to a DBMS (e.g., Oracle, MSAccess, SQL-Server). Generation of SQL-statements addressing NF2 data models such as those provided by the object-relational Oracle DBMS is also enabled. The query result is presented in a table based form where all storage model symbols are interpreted and can be exported for the usage with statistical software packages (e.g., SPSS)

    Data Vaults: a Database Welcome to Scientific File Repositories

    Get PDF
    Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration. The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion. The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers

    Object-relational spatio-temporal databases

    Get PDF
    We present an object-relational model for uniform handling of dimensional data. Spatial, temporal, spatio-temporal and ordinary data are special cases of dimensional data. The said uniformity is achieved through the concept of dimension alignment, which automatically allows lower dimensional data and queries to be used in a higher dimensional context;Unlike ordinary data, dimensional objects are interwoven. We introduce object identity (oid) fragments to circumvent data redundancy at logical level. Computed types are placed appropriately in a type hierarchy to allow maximal use of existing methods. A query language for spatio-temporal data is presented for associative navigation. A framework for algebraic optimization of the query language is suggested;A pattern matching language is designed for complex querying of spatio-temporal data which seamlessly extends the associative navigation in our query language. The pattern matching language recognizes special features of time and space providing an appropriate level of abstraction for application development compared to traditional languages. This reduces the need for embedding the query language in a lower level language such as C++. The pattern matching language is also dimensionally extensible. The pattern matching allows query of data with multiple granularities and continuous data. It also provides hooks for direct query of scientific data (observations);Our model is dimensionally extensible, and also an extension of a relational model for dimensional data. Moreover the dimensionality and addition of oids are mutually orthogonal concepts. Thus starting from classical ordinary data, one may migrate to higher forms of relational or object-relational data in any sequence, without having to recode application software. Our model does not deal with complex objects, which is left as a future extension

    Generic functional requirements for a NASA general-purpose data base management system

    Get PDF
    Generic functional requirements for a general-purpose, multi-mission data base management system (DBMS) for application to remotely sensed scientific data bases are detailed. The motivation for utilizing DBMS technology in this environment is explained. The major requirements include: (1) a DBMS for scientific observational data; (2) a multi-mission capability; (3) user-friendly; (4) extensive and integrated information about data; (5) robust languages for defining data structures and formats; (6) scientific data types and structures; (7) flexible physical access mechanisms; (8) ways of representing spatial relationships; (9) a high level nonprocedural interactive query and data manipulation language; (10) data base maintenance utilities; (11) high rate input/output and large data volume storage; and adaptability to a distributed data base and/or data base machine configuration. Detailed functions are specified in a top-down hierarchic fashion. Implementation, performance, and support requirements are also given

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend
    corecore